I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).

With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.

Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.

  • valtia@lemmy.world
    link
    fedilink
    arrow-up
    23
    ·
    edit-2
    7 days ago

    There can be duplicate SSNs due to name changes of an individual, that’s the easiest answer. In general, it’s common to just add a new record in cases where a person’s information changes so you can retain the old record(s) and thus have a history for a person (look up Slowly Changing Dimensions (SCD)). That’s how the SSA is able to figure out if a person changed their gender, they just look up that information using the same SSN and see if the gender in the new application is different from the old data.

    Another accusation Elon made was that payments are going to people missing SSNs. The best explanation I have for that is that various state departments have their own on-premise databases and their own structure and design that do not necessarily mirror the federal master database. There are likely some databases where the SSN field is setup to accept strings only, since in real life, your SSN on your card actually has dashes, those dashes make the number into a string. If the SSN is stored as a string in a state database, then when it’s brought over to the federal database (assuming the federal db is using a number field instead of text), there can be some data loss, resulting in a NULL.

    • DarthKaren@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      6 days ago

      JFC: married individuals, or divorced and name change back, would be totally fucked. Just on the very surface is his fuckery.

      • GoodEye8@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 days ago

        Hypothetically you could have a separate “previous names” table where you keep the previous names and on the main table you only keep the current name. There are a lot of ways to design a db to not unnecessarily duplicate SSNs, but without knowing the implementation it’s hard to say how wrong Musk is. But it’s obvious he doesn’t know what he’s talking about because we know that due to human error SSN-s are not unique and you can’t enforce uniqueness on SSN-s without completely fucking up the system. Complaining about it the way he did indicates that he doesn’t really understand why things are the way they are.

    • DreamlandLividity@lemmy.world
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      edit-2
      7 days ago

      Another accusation Elon made was that payments are going to people missing SSNs.

      A much simpler answer is that not all Americans actually have an SSN. The Amish for example have religious objections towards insurance, so they were allowed to opt out from social security and therefore don’t get an SSN.

      • lovely_reader@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        7 days ago

        It’s true that some Americans don’t have Social Security numbers, but those Americans can’t collect Social Security benefits unless/until they get one.

  • nednobbins@lemm.ee
    link
    fedilink
    arrow-up
    15
    arrow-down
    1
    ·
    edit-2
    6 days ago

    It’s so basic that documentation is completely unnecessary.

    “De-duping” could mean multiple things, depending on what you mean by “duplicate”.

    It could mean that the entire row of some table is the same. But that has nothing to do with the kind of fraud he’s talking about. Two people with the same SSN but different names wouldn’t be duplicates by that definition, so “de-duping” wouldn’t remove it.

    It can also mean that a certain value shows up more than once (eg just the SSN). But that’s something you often want in database systems. A transaction log of SSN contributions would likely have that SSN repeated hundreds of times. It has nothing to do with fraud, it’s just how you record that the same account has multiple contributions.

    A database system as large as the SSA has needs to deal with all kinds of variations in data (misspellings, abbreviations, moves, siblings, common names, etc). Something as simplistic as “no dupes anywhere” would break immediately.

      • nednobbins@lemm.ee
        link
        fedilink
        arrow-up
        2
        ·
        6 days ago

        Yeah. And the fix for that has nothing to do with “de-duping” as a database operation either.

        The main components would probably be:

        1. Decide on a new scheme (with more digits)
        2. Create a mapping from the old scheme to the new scheme. (that’s where existing duplicates would get removed)
        3. Let people use both during some transition period, after which the old one isn’t valid any more.
        4. Decide when you’re going to stop issuing old SSNs and only issue new ones to people born after some date.

        There’s a lot of complication in each of those steps but none of them are particularly dependant on “de-duped” databases.

      • DacoTaco@lemmy.world
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        6 days ago

        Just read the format of the us ssn in that wikipedia. That wasnt a smart format to use lol. Only supports 99*999 ( +/- 100k ) people per area code. No wonder numbers are reused.
        In some countries its birthday+sequence number encoded with gender+checksum and that has been working since the 80’s.
        Before that was a different number, but it wasnt future proof like the us ssn so we migrated away in the 80’s :')

        • Wispy2891@lemmy.world
          link
          fedilink
          arrow-up
          3
          ·
          6 days ago

          In my country the only way that someone has the same number is if someone was born on the same day (±1 century), in the same city and has the same name and family name. Is extremely difficult to have duplicates in that way (exception: immigrants, because the “city code” is the same for the whole foreign country, so it’s not impossible that there are two Ananya Gupta born on the same day in the whole India)

          • DacoTaco@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            6 days ago

            Oh ye, our system wouldnt fit india as its limited to 500 births a day ( sequence is 3, digits and depending if its even or uneven describes your gender ). Your system seems fine to me and beats the us system hands down haha

  • RabbitBBQ@lemmy.world
    link
    fedilink
    arrow-up
    17
    arrow-down
    1
    ·
    edit-2
    7 days ago

    It’s more than just SQL. Social Security Numbers can be re-used over time. It is not a unique identifier by itself.

    • KillingTimeItself@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 days ago

      i’ve heard conflicting reports on this, i have no idea to what degree this is true, but i would be cautious about making this statement unless you demonstrate it somehow.

        • KillingTimeItself@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 days ago

          On June 25, 2011, the Social Security Administration changed the SSN assignment process to “SSN randomization”,[36] which did the following:

          The Social Security Administration does not reuse Social Security numbers. It has issued over 450 million since the start of the program, about 5.5 million per year. It says it has enough to last several generations without reuse and without changing the number of digits. https://www.ssa.gov/history/hfaq.html

          evidently they must be doing something else on the backend for this to be working, assuming there are quite literally 100M numbers, which is going to be static due to math, obviously, but they clearly can’t be reassigning numbers to 3 people on average at any given time, without some sort of external mechanism.

          There are approximately 420 million numbers available for assignment.

          https://www.ssa.gov/employer/randomization.html

          that certainly doesnt seem like it would support several generations, possibly at our current birth rate i suppose.

          DDG AI bullshit tells me that there are a billion codes. https://www.marketplace.org/2023/03/10/will-we-ever-run-out-of-social-security-numbers/ this article says it’s 1 billion

          https://www.ssn-verify.com/how-many-ssns

          this website also lists it as approximately 1 billion.

          • DacoTaco@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            5 days ago

            I think i see the change. They are mentioning the ssn is 9 numbers long, which is 1 longer than the 3-3-2 format wikipedia mentions. That does mean its around 999mil numbers, which ye allows for a few generations ( like, 1 or 2 lol )

  • Garlicsquash@lemmings.world
    link
    fedilink
    English
    arrow-up
    13
    ·
    7 days ago

    Having never seen the database schema myself, my read is that the SSN is used as a primary key in one table, and many other tables likely use that as a foreign key. He probably doesn’t understand that foreign keys are used as links and should not be de-duplicated, as that breaks the key relationship in a relational database. As others have mentioned, even in the main table there are probably reused or updated SSNs that would then be multiple rows that have timestamps and/or Boolean flags for current/expired.

    • werefreeatlast@lemmy.world
      link
      fedilink
      arrow-up
      6
      arrow-down
      1
      ·
      7 days ago

      Is this is true, then by this time we are all fucked. Like Monday someone checks their banking or retirement and it all gone. That’s gonna be a crazy day.

      I hope they’re not using the actual SSN as the primary key. I hope its a big ass number that is otherwise unrelated.

  • rational_lib@lemmy.world
    link
    fedilink
    arrow-up
    12
    ·
    edit-2
    7 days ago

    To me I’m not really sure what his reply even means. I think it’s some attempt at a joke (because of course the government uses SQL), but I figure the joke can be broken down into two potential jokes that fail for different, embarrassing reasons:

    Interpretation 1: The government is so advanced it doesn’t use SQL - This interpretation is unlikely given that Elon is trying to portray the government as in need of reform. But it would make more sense if coming from a NoSQL type who thinks SQL needs to be removed from everywhere. NoSQL Guy is someone many software devs are familiar with who takes the sometimes-good idea of avoiding SQL and takes it way too far. Elon being NoSQL Guy would be dumb, but not as dumb as the more likely interpretation #2.

    Interpretation 2: The government is so backward it doesn’t use SQL - I think this is the more likely interpretation as it would be consistent with Elon’s ideology, but it really falls flat because SQL is far from being cutting-edge. There has kind of been a trend of moving away from SQL (with considerable controversy) over the last 10 years or so and it’s really surprising that Elon seems completely unaware of that.

    • dnick@sh.itjust.works
      link
      fedilink
      arrow-up
      2
      ·
      7 days ago

      My guess is that he thinks SQL is an app or implementation like MS-SQL. It would be pretty surprising if the government didn’t use SQL as in relational databases, but if it doesn’t it’s even more unlikely that he understands even the first part over whether having duplicate SS numbers is in any way unexpected or unreasonable. Most likely one of the junior devs somewhere along the lines misunderstood a query and said something uninformed and mocking, and he took that as a good dig to toss into a tweet.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      2
      ·
      7 days ago

      Thanks for genuine response. Lol, most who interpret my question that way you did don’t seem interested in a good faith discussion. But ol’ boy is def tripping if he thinks SQL isn’t used in the government.

      Big thing I’m intending to pry at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).

    • KillingTimeItself@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 days ago

      it’s probably using some sort of proprietary home grown database, because it’s probably old enough that no database could support what they needed, could be wrong on that one, but it was my best guess.

  • KillingTimeItself@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    2
    ·
    edit-2
    7 days ago

    TL;DR de-deuplication in that form is used to refer a technique where you reference two different pieces of data in the file system, with one single piece of data on the drive, the intention being to optimize file storage size, and minimize fragmentation.

    You can imagine this would be very useful when taking backups for instance, we call this a “Copy on Write” approach, since generally it works by copying the existing file to a second reference point, where you can then add an edit on top of the original file, while retaining 100% of the original file size, and both copies of the file (its more complicated than this obviously, but you get the idea)

    now just to be clear, if you did implement this into a DB, which you could do fairly trivially, this would change nothing about how the DB operates, it wouldn’t remove “duplicates” it would only coalesce duplicate data into one single tree to optimize disk usage. I have no clue what elon thinks it does.

    The problem here, as a non programmer, is that i don’t understand why you would ever de-duplicate a database. Maybe there’s a reason to do it, but i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another, or what elon is implying here (remove “duplicate” entries, however that’s supposed to work)

    Elon doesn’t know what “de-duplication” is, and i don’t know why you would ever want that in a DB, seems like a really good way to explode everything,

    • valtia@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      6 days ago

      i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another

      Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.

      what elon is implying here (remove “duplicate” entries, however that’s supposed to work)

      Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.

      • KillingTimeItself@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 days ago

        Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.

        in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case. Maybe even use historical backups or CoW to retain that kind of data.

        Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.

        and naturally, he doesn’t know what the term “de-duplication” means. Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.

        • valtia@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          6 days ago

          in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case.

          … That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there

          Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.

          … I don’t think you understand how modern databases are designed

          • KillingTimeItself@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            6 days ago

            … That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there

            u were talking about not keeping historical data, which is one of the proposed reasons you would have “duplicate” entries, i was just clarifying that.

            … I don’t think you understand how modern databases are designed

            it’s my understanding that when it comes to storing data that it shouldn’t be possible to have two independent stores of the exact same thing, in two separate places, you could have duplicate data entries, but that’s irrelevant to the discussion of de-duplication aside from data consolidation. Which i don’t imagine is an intended usecase for a DB. Considering that you literally already have one identical entry. Of course you could simply make it non identical, that goes without saying.

            Also, we’re talking about the DB used for the social security database, not fucking tigerbeetle.

      • DacoTaco@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        6 days ago

        Ssn being unique isnt a dumb idea, its a very smart idea, but due to the us ssn format its impossible to do. Hence to implement the idea you need to change the ssn format so it is unique before then.

        Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.

        • KillingTimeItself@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          6 days ago

          Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.

          even then, i wonder if there’s some sort of “row hash function” that takes a hash of all the data in a single entry, and generates a universally unique hash of that entry, as a form of “global id”

  • 9point6@lemmy.world
    link
    fedilink
    arrow-up
    259
    ·
    8 days ago

    The statement “this [guy] thinks the government uses SQL” demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.

    The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

    If he knew the domain, he would know this isn’t an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it’s not an issue.

    The man continues to be a malignant moron

    • snooggums@lemmy.world
      link
      fedilink
      English
      arrow-up
      29
      arrow-down
      1
      ·
      edit-2
      8 days ago

      The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

      Since SSNs are never reused, what would be the purpose of using the SSN and birth date together as part of the primary key? I guess it is the one thing that isn’t supposed to ever change (barring a clerical error) so I could see that as a good second piece of information, just not sure what it would be adding.

      Note: if duplicate SSNs are accidentally issued my understanding is that they issue a new one to one of the people and I don’t know how to find the start of the thread on twitter since I only use it when I accidentally click on a link to it.

      https://www.ssa.gov/history/hfaq.html

      Q20: Are Social Security numbers reused after a person dies?

      A: No. We do not reassign a Social Security number (SSN) after the number holder’s death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system.

      • Lightor@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        7 days ago

        My guess would be around your note. If someone mistakenly has two SSNs (due to fraud, error, or name changes), combining DOB helps detect inconsistencies.

        Some other possibilities, and I’m just throwing out ideas at this point:

        • Adding DOB could help with manual lookups and verification.
        • Using SSN + DOB ensures a standard key format across agencies, making it easier to link records.
        • Prevents accidental duplication if an SSN is mistyped.
        • Maybe the databases were optimized for fixed-length fields, and combining SSN + DOB fit within memory constraints.
        • It was easier to locate records with a “human-readable” key. Where as something like a UUID is harder for humans to read or sift through.
      • DahGangalang@infosec.pubOP
        link
        fedilink
        arrow-up
        2
        ·
        8 days ago

        Beat me to asking this follow up, though you linking additional resources is probably more effort that I would have done. Thanks for that!

    • bitchkat@lemmy.world
      link
      fedilink
      English
      arrow-up
      27
      ·
      8 days ago

      The sheer size of the federal government and its age would mean there are thousands of databases out there. Some may be so old that they predate RDBMS/SQL.

      That alone makes his comment come from a place of ignorance. Of course it’s confident ignorance. The worst kind.

    • Dkarma@lemmy.world
      link
      fedilink
      arrow-up
      18
      arrow-down
      2
      ·
      8 days ago

      Lol talk about burying the lede… The issue here is that the government absolutely uses SQL to traverse a DB and anyone who thinks otherwise is an idiot.

      • DahGangalang@infosec.pubOP
        link
        fedilink
        arrow-up
        4
        ·
        8 days ago

        Naw, I definitely meant to be asking about duplication of data in databases (vs if the government actually uses SQL).

        Sorry to have communicated that so poorly. Everyone seems to be taking the angle you’re arguing though. Guess I’ll need to work on that.

    • orcrist@lemm.ee
      link
      fedilink
      arrow-up
      15
      ·
      8 days ago

      Elon Musk is also an idiot. He thinks he’s smart enough to quickly understand complex situations and complex problems about which he knows next to nothing, within just a few minutes.

      Most people would only try to claim that level of understanding in areas with which they have professional experience or about which they’re extremely geeky. He does it with everything, and nobody can be an expert in everything, and everybody knows that except for narcissistists.

      I suppose for non-tech people it might be convenient to assume that because someone knows something about some kind of tech, they therefore know a lot about all kinds of tech, and the reality is that’s just not true. There are so many fields that are totally different. But if it did, actually he would look even more idiotic, because Twitter is a train wreck, so clearly he’s incompetent in tech field, right?

      • thatKamGuy@sh.itjust.works
        link
        fedilink
        arrow-up
        13
        ·
        8 days ago

        The SSN is 9 digits long; so technically they would have to start re-using them after the billionth one. Given the current population size, and how many people have been born/died since its implementation - it’s fair to say they haven’t had to re-use any figures yet.

      • Maggoty@lemmy.world
        link
        fedilink
        arrow-up
        19
        arrow-down
        1
        ·
        8 days ago

        But I was assured he was a materials engineer, rocket scientist, computer programmer, and businessman extraordinaire!

    • Phoenixz@lemmy.ca
      link
      fedilink
      arrow-up
      9
      arrow-down
      1
      ·
      8 days ago

      Elin musk is a (criminal) scammer, he always has been.

      He was fired for incompetence from his own company

      Pretty much everything he’s promised for every company he has headed had been a lie. Tesla full self driving? Lie. Hyperloop? All lies to successful kill high speed rail and start a movement that wasted billions of dollars including tax payer money. Even SpaceX, the least shit of all, is shit. Once you really look at it, its all promises with no results and lots of cheering when millions of tax payer dollars -yet again- blow up in the sky.

      The guy has one quality: convincing people that he’s smart even though he literally doesn’t know shit

    • Aeao@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      8 days ago

      I’m not arguing that Elon musk is anything but an absolute tool.

      SS numbers have 999 million options. Are we already repeating them?

  • missingno@fedia.io
    link
    fedilink
    arrow-up
    85
    arrow-down
    4
    ·
    8 days ago

    Because SQL is everywhere. If Musk knew what it was, he would know that the government absolutely does use it.

    • credo@lemmy.world
      cake
      link
      fedilink
      arrow-up
      5
      arrow-down
      2
      ·
      edit-2
      8 days ago

      This explanation makes no sense in the context of OP’s question, given the order of comments…

      • finitebanjo@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        8 days ago

        Yeah, a better explanation is that Deduplicating Databases are an absolutely terrible idea for every use case, as it means deleting history from the database.

  • darkmarx@lemmy.world
    link
    fedilink
    English
    arrow-up
    65
    arrow-down
    1
    ·
    8 days ago

    “The government” is multiple agencies and departments. There is no single computer system, database, mainframe, or file store that the entire US goverment uses. There is no standard programming language used. There is no standard server configuration. Each agency is different. Each software project is different.

    When someone says the government doesn’t use sql, they don’t know what they are talking about. It could be refering to the fact that many government systems are ancient mainframe applications that store everything in vsam. But it is patently false that the government doesn’t use sql. I’ve been on a number of government contracts over the years, spanning multiple agencies. MsSQL was used in all but one.

    Furthermore, some people share SSNs, they are not unique. It’s a common misconception that they are, but anyone working on a government software learns this pretty quickly. The fact that it seems to be a big shock goes to show that he doesn’t know what he is doing and neither do the people reporting to him.

    Not only is he failing to understand the technology, he is failing to understand the underlying data he is looking at.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      10
      ·
      edit-2
      8 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the Vice Bro doesn’t understand how SQL works).

      I’m not aware of any instance where two people share an SSN though. The Social Security Administration even goes as far as to say they don’t recycle the SSNs of dead people (its linked a couple times in other comments and Voyager doesn’t let me save drafts of comments, I’ll make an edit to this comment with that link for you).

      Can you point me to somewhere showing multiple people can share an SSN?

      Edit: as promised: The Social Security FAQ page

      • WarlordSdocy@lemmy.world
        link
        fedilink
        arrow-up
        8
        ·
        8 days ago

        I mean I don’t know a ton about SQL but one thing to keep in mind about SSNs is they were not originally meant to be used for identification but because we have no form of national id and places still needed a way to verify who you are people just started using SSNs for that since it’s something everyone has and there wasn’t really a better option. So now the government has been having to try and make them work for that and make them more secure. The better solution would be to make some form of national id that is designed to be secure but Republicans and people like Musk would probably call that government overreach or a way to spy and track people.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          2
          ·
          8 days ago

          Ugh, YES, I am so frustrated at the counter arguments for this that I constantly hear spouted by my (ultra-conservative) family.

          I hope that notion re-enters the public consciousness as a part of this (not holding my breath tho)

      • socsa@piefed.social
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        1
        ·
        8 days ago

        My wife has a tax payment history under two different legal names which share a single SSN

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          2
          ·
          8 days ago

          Hmmm, well I can’t speak to how the actual databases are put together, so maybe they would have that as two separate unique primary keys with a duplicated SSN.

          But it really seems like bad design if they out it together that way…

          • JoeyJoeJoeJr@lemmy.ml
            link
            fedilink
            arrow-up
            1
            ·
            8 days ago

            Worth noting is that “good” database design evolved over time (https://en.wikipedia.org/wiki/Database_normalization). If anything was setup pre-1970s, they wouldn’t have even had the conception of the normal forms used to cut down on data duplication. And even after they were defined, it would have been quite a while before the concepts trickled down from acedmemia to the engineers actually setting up the databases in production.

            On top of that, name to SSN is a many-to-many relationship - a single person can legally change their name, and may have to apply for a new SSN (e.g. in the case of identity theft). So even in a well normalized database, when you query the data in a “useful” form (e.g. results include name and SSN), it’s probably going to appear as if there are multiple people using the same SSN, as well as multiple SSNs assigned to the same person.

      • kboy101222@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        8 days ago

        I’d imagine the numbers of dead people eventually get cycled around to. 9 digits only gives you 999,999,999 people to go through, and we have over a third of that in existence right now.

  • Nate Cox@programming.dev
    link
    fedilink
    English
    arrow-up
    58
    arrow-down
    1
    ·
    8 days ago

    Because a simple query would have shown that SSN was a compound key with another column (birth date, I think), and not the identifier he thinks it is.

    • BombOmOm@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      3
      ·
      edit-2
      8 days ago

      Why would one person, one SSN ever have two different birth dates? That sounds like an issue all onto itself.

      • geoff@lemm.ee
        link
        fedilink
        arrow-up
        13
        ·
        8 days ago

        I think what he means is that the unique identifier for a database record is a composite of two fields: SSN + birth date. That doesn’t mean that SSN to birth date is a one-to-many relation.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          4
          arrow-down
          2
          ·
          8 days ago

          But they are implying SSN to SSN+Birthdate is a one-to-many relationship. Since SSN to SSN should be one-to-one, you can conclude the SSN to Birthdate is one-to-many, right?

          • Nate Cox@programming.dev
            link
            fedilink
            English
            arrow-up
            10
            ·
            8 days ago

            No, who said there was a relationship?

            A compound key is a composite key where one or both sides can be foreign keys to other tables themselves; it’s a safe assumption this is probably true in a large data set like social security. A composite key is a candidate key (a uniquely identified key) made up of more than one column.

            This basically means that there is a finite number of available SSNs because they’re only 10 digits long and someone intends to recycle SSNs after the current user of one dies. Linking it to birthday is “unique enough” as to never recur.

            • DahGangalang@infosec.pubOP
              link
              fedilink
              arrow-up
              1
              ·
              8 days ago

              I think I was getting some wires crossed and/or misunderstood what geoff (parent commentor to my last comment) was saying, so my comment may be misdirected some.

              But according to The Social Security FAQ page, SSNs are not recycled, so that data (especially when compounded and hashed with other data) should be able to establish a one-to-one relationship between each primary key and an SSN, thusly having SSNs appear associated with multiple primary keys is a concern.

              Other comments have pointed to other explanations for why SSNs could appear to occur multiple times, but those amount to “it appeared in a different field associated with the same primary key”. I think thats the most likely explanation of things.

              • jj4211@lemmy.world
                link
                fedilink
                arrow-up
                7
                ·
                edit-2
                8 days ago

                Note that it being only part of a key is a technology choice that does not require the reality map to it. It may seem like overkill, but someone may not trust the political process to preserve that promise and so they add the birthdate, just in case something goes sideway in the future. Lots of technical choices are made anticipating likely changes and problems and designing things to be extra robust in the face of those

      • DahGangalang@infosec.pubOP
        link
        fedilink
        arrow-up
        6
        arrow-down
        1
        ·
        edit-2
        8 days ago

        A weak example would be my grandma. She was born before social security and was told as a kid she was born in 1938. Because I guess in the olden days, you just didn’t need to pass your birth certificate around for anything, it wasn’t until she went to get married at ~age 25 that she needed her birth certificate and when she got it, it actually said she was born in 1940 (I forget the actual years, but I remember it was a two year and two day gap between dates).

        Its a weak example that should apply to only a microscopic portion of the population, but I could see her having some weird records in the databases as a result.

        Edit: brain dropped out and I forgot part of a sentence.

  • GaMEChld@lemmy.world
    link
    fedilink
    arrow-up
    54
    arrow-down
    2
    ·
    8 days ago

    Because of course the government uses SQL. It’s as stupid as saying the government doesn’t use electricity or something equally stupid. The government is myriad agencies running myriad programs on myriad hardware with myriad people. My damned computers at home are using at least 2-3 SQL databases for some of the programs I run.

    SQL is damn near everywhere where data sets are found.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      5
      arrow-down
      2
      ·
      8 days ago

      Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.

      Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).

      • aesthelete@lemmy.world
        link
        fedilink
        arrow-up
        6
        ·
        edit-2
        7 days ago

        SSNs being duplicated would be entirely expected depending upon the table’s purpose. There are many forms of normalization in database tables.

        I mean just think about this a little bit, if the purpose is transactions or something and each row has a SSN reference in it for some reason, you’d have a duplicate SSN per transaction row.

        A tiny bit of learning SQL and you could easily see transactional totals grouped by SSN (using, get this, a group by clause). This shit is all 100% normal depending upon the normalization level of the schema. There are even – almost obviously – tradeoffs between fully normalizing data and being able to access it quickly. If I centralize the identities together and then always only put the reference id in a transactional table, every query that needs that information has to go join to it and the table can quickly become a dependency knot.

        There was a “member” table for instance in an IBM WebSphere schema that used to cause all kinds of problems, because every single record was technically a “member” so everything in the whole system had to join to it to do anything useful.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          1
          ·
          7 days ago

          had to join to it

          I don’t think I get what this means. As you describe it, that reference id sounds comparable to a pointer, and so there should be a quick look up when you need to de-reference it, but that hardly seems like a “dependency knot”?

          I feel like this is showing my own ignorance on the back end if databasing. Can you point me to references that explain this better?

          • aesthelete@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            7 days ago

            I’m talking about a SQL join. It’s essentially combining two tables into one set of query results and there are a number of different ways to do it.

            https://www.w3schools.com/sql/sql_join.asp

            Some joins are fast and some can be slow. It depends on a variety of different factors. But making every query require multiple joins to produce anything of use is usually pretty disastrous in real-life scenarios. That’s why one of the basics of schema design is that you usually normalize to what’s called third normal form for transactional tables, but reporting schemas are often even less normalized because that allows you to quickly put together reporting queries that don’t immediately run the database into the ground.

            DB normalization and normal forms are practically a known science, but practitioners (and sometimes DBAs) often have no clue that this stuff is relatively settled and sometimes even use a completely wrong normal form for what they are doing.

            https://en.m.wikipedia.org/wiki/Database_normalization

            In most software (setting aside well-written open source), the schema was put together by someone who didn’t even understand what normal form they were targeting or why they would target it. So the schema for one application will often be at varying forms of normalization, and schemas across different applications almost necessarily will have different normal forms within them even if they’re properly designed.

            All that said, detecting, grouping, comparing, and removing duplicates is a basic function of SQL. It’s definitely not expected that, for instance, database tables would never contain a duplicate reference to a SSN. Leon is indeed demonstrating here that he’s a complete idiot when it comes to databases. (And he goes a step further by saying the government doesn’t use SQL when it obviously does somewhere. SQL databases are so ubiquitous that just about any modern software package contains one.)

      • GaMEChld@lemmy.world
        link
        fedilink
        arrow-up
        6
        arrow-down
        1
        ·
        8 days ago

        Oh, well another user pointed out that SSN’s are not unique, I think they are recycled after death or something. In any case, I do know that when the SSN system was first created it was created by people who said this is NOT MEANT to be treated as unique identifiers for our populace, and if it were it would be more comprehensive than an unsecure string of numbers that anyone can get their hands on. But lo and behold, we never created a proper solution and we ended up using SSN’s for identity purposes. Poop.

        • 【J】【u】【s】【t】【Z】@lemmy.world
          link
          fedilink
          arrow-up
          6
          arrow-down
          1
          ·
          edit-2
          8 days ago

          I’m pretty sure there is a federal statute that says ONLY the SSA may collect or use SSNs, as to federal agencies. I argued it once when a federal agency court tried to tell me that it couldn’t process part of my client’s case without it. I didn’t care but my client was crotchety and would only even give me the last four.

          Edit. It’s a regulation:

          https://www.law.cornell.edu/cfr/text/28/802.23

          An agency cannot require disclosure of an SSN for any right or benefit unless a specific federal statute requires it or the agency required the disclosure prior to 1975.

          In my case the agency got back to me with some federal statute that didn’t say what they said it said, and eventually they had to admit they were wrong.