• duckythescientist@sh.itjust.works
    link
    fedilink
    arrow-up
    62
    ·
    14 days ago

    I’m also not sure if this is obscure, but Bloom Filters! It’s a structure that you can add elements to then ask it if it has seen the element before with the answer being either “no” or “probably yes”. There’s a trade-off between confidence of a “probably yes”, how many elements you expect to add, and how big the Bloom Filter is, but it’s very space and time efficient. And it uses hash functions which always make for a fun time.

    • lad@programming.dev
      link
      fedilink
      English
      arrow-up
      22
      ·
      14 days ago

      Relevant xkcd

      in Randall's words

      Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they’re the right one you can never be sure.

    • FizzyOrange@programming.dev
      link
      fedilink
      arrow-up
      8
      ·
      14 days ago

      Obscure 10 years ago maybe. These days there have been so many articles about them I bet they’re more widely known than more useful and standard things like prefix trees (aka tries).

  • Vorpal@programming.dev
    link
    fedilink
    arrow-up
    21
    ·
    edit-2
    14 days ago

    XOR lists are obscure and cursed but cool. And not useful on modern hardware as the CPU can’t predict access patterns. They date from a time when every byte of memory counted and CPUs didn’t have pipelines.

    (In general, all linked lists or trees are terrible for performance on modern CPUs. Prefer vectors or btrees with large fanout factors. There are some niche use cases still for linked lists in for example kernels, but unless you know exactly what you are doing you shouldn’t use linked data structures.)

    EDIT: Fixed spelling

  • Gobbel2000@programming.dev
    link
    fedilink
    arrow-up
    13
    ·
    14 days ago

    The CSR (compressed sparse row) format is a very simple but efficient way of storing sparse matrices, meaning matrices with a large amount of zero entries, which should not all occupy memory. It has three arrays: one holds all non-zero entries in order, read row by row, the next array contains the column indices of each non-zero element (and therefore has the same length as the first array), the third array indices into the first array for the first element of each row, so we can tell where a new row starts.

    On sparse matrices it has optimal memory efficiency and fast lookups, the main downside is that adding or removing elements from the matrix requires shifting all three arrays, so it is mostly useful for immutable data.

    • jxk@sh.itjust.works
      link
      fedilink
      arrow-up
      2
      ·
      12 days ago

      Oh yeah that’s a good one

      And also, if you’re representing a 0/1 matrix, you can just do away with the first column altogether.

      • Gobbel2000@programming.dev
        link
        fedilink
        arrow-up
        1
        ·
        12 days ago

        Right, which occurs in particular when dealing with graphs, which are basically matrices and usually sparse. Large graphs are what I used this format for, however I also needed edge weights, so the first column was still required for that.

  • litchralee@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    12
    ·
    14 days ago

    IMO, circular buffers with two advancing pointers are an awesome data structure for high performance compute. They’re used in virtualized network hardware (see virtio) and minimizing Linux syscalls (see io_uring). Each ring implements a single producer, single consumer queue, so two rings are usually used for bidirectional data transfer.

    It’s kinda obscure because the need for asynchronous-transfer queues doesn’t show up that often unless dealing with hardware or crossing outside of a single CPU. But it’s becoming relevant due to coprocessors (ie small ARM CPUs attached to a main CPU) that process offloaded requests and then quickly return the result when ready.

    • notabot@piefed.social
      link
      fedilink
      English
      arrow-up
      2
      ·
      14 days ago

      I came here to mention these too. One addition that can be helpful in large trees is to add a depth attribute to each node so that you can easily limit the depth of subtree you retrieve.

  • myfavouritename@beehaw.org
    link
    fedilink
    English
    arrow-up
    8
    ·
    14 days ago

    I get way more use out of Doubly Connected Edge Lists (DCEL) than I ever thought I would when I first learned about them in school.

    When I want to render simple stuff to the screen, built-in functions like ‘circle’ or ‘line’ work. But for any shapes more complicated than that, I often find that it’s useful to work with the data in DCEL form.

  • solomonschuler@lemmy.zip
    link
    fedilink
    arrow-up
    6
    ·
    14 days ago

    skiplists are interesting data structures. The underlying mechanism is it’s a 2-dimensional probabilistic linked list with some associated height ‘h’ that enables skipping of nodes through key-value pairs. So, compared to a traditional linked list that uses a traversal method to search through all values stored. A skip list starts from the maxLevel/maxheight, determines if “next” points to a key greater than the key provided or a nullptr, and moves down to the level below it if it is. This reduces the time complexity from O(1) with a linked list to O(N) where N Is the maxLevel.

    The reason behind why its probabilistic (in this case using a pseudo random number) is because its easier to insert and remove elements, otherwise (if you went with the idealized theoretical form) you would have to reconstruct the entire data structure each and every time you want to add/remove elements.

    In my testing when adding 1,000,000 elements to a skiplist it reduced from 6s search with a linked list to less than 1s!

    • tyler@programming.dev
      link
      fedilink
      arrow-up
      5
      ·
      14 days ago

      Aren’t these one of the first structures you learn about in any comp sci course? Still good to know but not sure it’s obscure.

  • palordrolap@fedia.io
    link
    fedilink
    arrow-up
    6
    ·
    14 days ago

    An ultimately doomed one that existed in Perl for a while was the pseudohash. They were regular integer-indexed arrays that could be accessed as though they were hashes (aka associative arrays / dictionaries). They even made it into the main Perl books at the time as this awesome time saving device. Except they weren’t.

    I did a quick web search just now and someone did a talk about why they weren’t a great idea and they tell it better than I could; Link: https://perl.plover.com/classes/pseudohashes/

    The supplied video doesn’t have great sound quality, and it might be better to just click through the slides under Outline at the bottom there.

  • Atlas_@lemmy.world
    link
    fedilink
    arrow-up
    5
    ·
    13 days ago

    Fibonacci heaps are pretty cool. Not used very often b/c they’re awful to implement, but better complexity than many other heaps.

    Also Binary Lifting is closer to an algorithm than a data structure but it’s used in Competitive Programming a fair bit, and isn’t often taught: https://cp-algorithms.com/graph/lca_binary_lifting.html

    And again closer to an algo tham a data structure, but Sum over Subsets DP in 3^n also has a cool little bit of math in it: https://cp-algorithms.com/algebra/all-submasks.html

  • Trigger2_2000@sh.itjust.works
    link
    fedilink
    arrow-up
    4
    ·
    13 days ago

    Not really a data structure per say, but just knowing LISP and the interesting structures it uses internally.

    The results of LISP operations CAR, CDR, CADR and the other one I can’t remember now.

  • idunnololz@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    14 days ago

    I personally don’t think it’s that obscure but I have never seen this used in production code that I didn’t write: the linked hash map or ordered hash map.

  • JackbyDev@programming.dev
    link
    fedilink
    English
    arrow-up
    4
    ·
    13 days ago

    B trees are cool but not obscure necessarily. I didn’t learn about them in college. It sounds like binary tree and it’s similar but it’s different. It’s a data structure to take advantage of the way disk reads work.