• 0 Posts
  • 27 Comments
Joined 10 months ago
cake
Cake day: May 9th, 2024

help-circle
  • No problem. I think this is a great “final boss” question for learning sed, because it turns out it is deceptively hard!! You have to understand not only a lot about regex, but about sed to get it right. I learned a lot about sed just by tackling this problem!

    I really do not want to mess around with your regex

    It is very delicate for sure, but one part you can for sure change is at the # Add hyphens part. In the regex you can see (%20|\.). These are a list of “characters” which get converted to hyphens. For example, you could modify it to (%20|\.|\+) and it will convert +s to -s as well!

    Still it is not perfect:

    • If the link spans multiple lines, the regex won’t match
    • If the link contains escaped characters like \\\\\[LINK](#LINK) or [LINK\]\\\\](#LINK)
    • If the link is inside a code block ``` it will get changed (which may or may not be intended)

    But for a sed-only solution this is about as good as it will get I’m afraid.

    Overall I’m very happy with it. Someday I would like to make a video that goes into depth about sed, since it is tricky to learn just from the docs.


  • I did it!! It also handles the case where an external link and internal link are on the same line :D

    sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'
    

    Here is my annotated file

    # Begin loop
    :l;
    
    # Bisect first link in pattern space into pattern space and append to hold space
    # Example: `text [label](file#fragment)'
    #   Pattern space: `file#fragment)'
    #   Hold space: `text [label]('
    # Steps:
    #   1. Strategically insert \n
    #       1a. If this fails, branch out
    #   2. Append to hold space (this creates two \n's. It feels weird for the
    #      first iteration, but that's ok)
    #   3. Copy hold space to pattern space, remove first \n, then trim off
    #      everything past the second \n
    #   4. Swap pattern/hold, and trim off everything up to and incl the last \n
    s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;
    Te;
    H;
    g; s/\n//; s/\n.*//;
    x; s/.*\n//;
    
    # Modify only if it is an internal link
    /^https?:/! {
        # Add hyphens
        :h;
        s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;
        th;
        # Make lowercase
        s/(#[^)]*\))/\L\1/;
    };
    
    # "conditional" branch so it checks the next conditional again
    tl;
    
    # Exit: join pattern space to hold space, then move to pattern space.
    # Since the loop uses H instead of h, have to make sure hold space is empty
    :e;
    H;
    z;
    x; s/\n//;
    


  • Why you assume there’s only one link in the line?

    They did not want external (http) links to be modified as that would break it:

    • [Example](https://example.com/#Some%20Link)
    • [Example](https://example.com/#some-link)

    I compromised by thinking that it might be unlikely enough to have an external http link AND internal link within the same line. You could probably still do it, my first thought was [^h][^t][^t][^p] but that would cause issues for #ttp and #A so i just gave up. Instead I think you’d want a different approach, like breaking each link onto their own line, do the same external/internal check before the substitution, and join the lines afterward.

    Also, you perform substitutions in the whole URL instead of the fragment component

    That requirement i missed. I just assumed the filename would be replaced the same way too Lol. Not too hard to fix tho :)


  • annotated it is working like this:

    # use a loop to iteratively replace the %20 with -, since doing s/%20/-/g would replace too much. we loop until it cant substitute any more
    
    # label for looping
    :loop;
    # skip the following substitute command if the line contains an http link in markdown format
    /\[[^]]*\](http/!
    # capture each part of the link, and join it together with -
    s/\(\[[^]]*\]\)\(([^)]*\)%20\([^)]*)\)/\1\2-\3/g;
    # if the substitution made a change, loop again, otherwise break
    t loop;
    
    # convert all insides to the link lowercase if the line doesnt contain an http link
    /\[[^]]*\](http/!
    # this is outside the loop rather than in the s command above because if the link doesnt contain %20 at all then it won't convert to lowercase
    s/\(\[[^]]*\]\)\(([^)]*)\)/\1\L\2/g
    

  • This is very close

    sed ':loop;/\[[^]]*\](http/! s/\(\[[^]]*\]\)\(([^)]*\)%20\([^)]*)\)/\1\2-\3/g;t loop;/\[[^]]*\](http/! s/\(\[[^]]*\]\)\(([^)]*)\)/\1\L\2/g'
    

    example file

    [Some text](#Header%20Linking%20MARKDOWN.md)
    (#Should%20stay%20as%20is.md)
    Text surrounding [a link](readme.md#Other%20Page). Cool
    Multiple [links](#Links.md) in (%20) [a](#An%20A.md) SINGLE [line](#Lines.md)
    Do [NOT](https://example.com/URL%20Should%20Be%20Untouched.html) CHANGE%20 [hyperlinks](http://example.com/No%20Touchy.html)
    

    but it doesn’t work if you have a http link and markdown link in the same line, and doesn’t work with [escaped \] square brackets](#and-escaped-\)-parenthesis) in the link

    but!! it was fun!




  • Something i didnt know for a long time (even though its mentioned in the book pretty sure) is that enum discriminants work like functions

    #[derive(Debug, PartialEq, Eq)]
    enum Foo {
        Bar(i32),
    }
    
    let x: Vec<_> = [1, 2, 3]
        .into_iter()
        .map(Foo::Bar)
        .collect();
    assert_eq!(
        x,
        vec![Foo::Bar(1), Foo::Bar(2), Foo::Bar(3)]
    );
    

    Not too crazy but its something that blew my mind when i first saw it


  • i’m tricking the nintendo switch into thinking my computer is a bluetooth pro controller. I’m using a crate called bluer which exposes bindings to the BlueZ stack and it’s been great to use.

    I got to the point where it pairs the controller and hits B to exit. However it doesnt seem to accept any more button presses after that… :) So I have some ways to go.

    I’ve also needed a project where I can challenge myself with the basics of async without it being overwhelming, and I think this hits the sweet spot. It’s my first time using tokio spawn, join, and select in a real project!


  • My reasons were more hardware related. When I was a bit younger my parents gave me a netbook which had 32 GB of storage, and Windows used almost all of it. I wanted to do creative projects in my free time, but I couldn’t install programs or save any of my work. I would often restart to clear log files and gain a bit more working storage, which was extremely annoying because it took like 5 mins for the computer to finally settle down and be usable.

    I eventually got a 32GB flash drive which helped a lot, but it was not enough. With 4GB ram I could only have about 3 browser tabs open, and not all the programs I wanted could be run off the flash drive. It was still resource management hell.

    Somehow, some way, I learned about Linux. I got a 128GB microSD, put Mint on it. It truly set me free. I could install the software I wanted, I could make the things I wanted to make, I could open more programs at once, and I could do it all without unbearable lag. I never looked back since.





  • You might be okay with this:

    macro_rules! span {
        ($line:expr, $column:expr) => {
            Span {
                line: $line,
                column: $column,
                file_path: None,
            }
        };
        ($line:expr, $column:expr, $file_path:literal) => {
            Span {
                line: $line,
                column: $column,
                file_path: Some($file_path.to_string()),
            }
        };
        ($line:expr, $column:expr, $file_path:expr) => {
            Span {
                line: $line,
                column: $column,
                file_path: $file_path,
            }
        };
    }
    

    Playground

    However, sometimes I don’t want to pass in the file path directly but through a variable that is Option<String>.

    Essentially I took this to mean str literals will be auto wrapped in Some, but anything else is expected to be Option<String>



  • More progress on the Finite Projective Plane (incidence matrix) generation from last week. There already exists an algorithm to generate boards of order p+1 where p is prime. It is stateless, so with CUDA we can generate huge boards in seconds since all you need is the x, y position and board size. 258x258 under 3s!

    However, p+1 isn’t the only sequence. It seems by our observations that the fermat numbers also generate valid boards, using our “naïve” algorithm.

    Unfortunately 3x3, 5x5, and 17x17 might not contain all the nuggets of generality to find a nice algorithm like the p+1, so we’re gonna generate the next up: 257x257. We’ve been improving the naïve algorithm since it is too slow. (The resulting image would be 65793x65793)

    • Rather than allocating the 2d boolean grid, we represent where the true elements would be using row and column indexes. This is okay because of the constraint which limits how many true elements can be in a row/column
      • benefit 1 — less memory usage: “O(2n)” vs O(n²) ((for 257x257: 129MiB vs 4GiB))
      • benefit 2 — faster column-major lookups (flamegraph spent a lot of time sitting in iterators)
      • overall speedup: about 2.7x
    • Speed up index lookup with binary search
      • The index list is sorted by nature. To exhaustively check a dot is valid, it checks n² spots in 2 lists of size n. Slightly more expensive than the grid given the 2 index lists. Rather than slice::contains, use slice::binary_search(...).is_ok()
      • overall speedup: about 2.1x

    Next steps:

    • Assume a square grid and exploit its diagonal symmetry to treat row lookups as column lookups
    • Use multi threading to gain a partial speedup
      • Essentially if row 1 is 50% completed, row 2 can be up to 50% completed.
      • I think you get different speeds depending whether the threads and symmetry folds are both row/column major or one is row-major and the other is column-major. My gut says both need to be aligned because there’s less waiting involved.

  • Apparently generating “Finite Projective Planes”. For context on how I got here, I went camping with my family and brought the game Spot It. My brother was analyzing it and came up with the same type of pattern.

    When we got home he made a python script to generate these boards, but it was quite slow, so he half joked asking me to rewrite it in Rust.

    I kinda struggled a bit since I didn’t fully understand what it was doing. Near the end I even got a segfault using safe code😃! (i was spawning a thread with a large stack size, and allocating huge slices on its stack, rather than you know… boxing the slice Lol.) When I finally got it working, it ended up being in the ballpark of a 23x speedup. Not bad for changing the language choice!

    There’s lots of room for improvement left for sure. The algorithm could benefit with some running statistics about cols/rows and the algorithm itself is quite naïve and could maybe be improved too :P


  • tuna@discuss.tchncs.detoScience Memes@mander.xyzI just cited myself.
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    edit-2
    8 months ago

    If they aren’t equal, there should be a number in between that separates them. Between 0.1 and 0.2 i can come up with 0.15. Between 0.1 and 0.15 is 0.125. You can keep going, but if the numbers are equal, there is nothing in between. There’s no gap between 0.1 and 0.1, so they are equal.

    What number comes between 0.999… and 1?

    (I used to think it was imprecise representations too, but this is how it made sense to me :)


  • tuna@discuss.tchncs.detoScience Memes@mander.xyzElsevier
    link
    fedilink
    English
    arrow-up
    57
    arrow-down
    1
    ·
    8 months ago

    Imagine they have an internal tool to check if the hash exists in their database, something like

    "SELECT user FROM downloads WHERE hash = '" + hash + "';"
    

    You set the pdf hash to be 1'; DROP TABLE books;-- they scan it, and it effectively deletes their entire business lmfaoo.

    Another idea might be to duplicate the PDF many times and insert bogus metadata for each. Then submit requests saying that you found an illegal distribution of the PDF. If their process isn’t automated it would waste a lot of time on their part to find the culprit Lol

    I think it’s more interesting to think of how to weaponize their own hash rather than deleting it