• grabyourmotherskeys@lemmy.world
    link
    fedilink
    English
    arrow-up
    232
    ·
    1 year ago

    I haven’t read the article because documentation is overhead but I’m guessing the real reason is because the guy who kept saying they needed to add more storage was repeatedly told to calm down and stop overreacting.

    • krellor@kbin.social
      link
      fedilink
      arrow-up
      115
      ·
      1 year ago

      I used to do some freelance work years ago and I had a number of customers who operated assembly lines. I specialized in emergency database restoration, and the assembly line folks were my favorite customers. They know how much it costs them for every hour of downtime, and never balked at my rates and minimums.

      The majority of the time the outages were due to failure to follow basic maintenance, and log files eating up storage space was a common culprit.

      So yes, I wouldn’t be surprised at all if the problem was something called out by the local IT, but were overruled for one reason or another.

      • Oliver Lowe@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        35
        ·
        1 year ago

        and log files eating up storage space was a common culprit.

        Another classic symptom of poorly maintained software. Constant announcements of trivial nonsense, like [INFO]: Sum(1, 1) - got result 2! filling up disks.

        I don’t know if the systems you’re talking about are like this, but it wouldn’t surprise me!

        • DukeMcAwesome@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          24
          ·
          1 year ago

          You gotta forward that to Spunk so your logs ain’t filling up the server generating them. Plus you can set up automated alerts for when the result stops being 2.

          This message brought to you by Big Splunk.

        • afraid_of_zombies@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          1 year ago

          Yeah a few levels.

          Level 1: complex stand alone devices, mostly firmware.

          Level 1a. Stuff slightly more complicated than a list of settings, usually for something like a VFD or a stepper motor controllers. Not as common.

          Level 2 PLCs, HMIs, and the black magic robotic stuff. Stand alone equipment. Like imagine a machine that can take something, heat it up, and give it to the next machine.

          Level 3: DCS and SCADA. Data control center and whatever SCADA stands for, I always forget. This is typically for integrating or at least data collection of multiple stand alone equipment for level 2.

          Level 4: the integration layer between Level 3 and whatever means the company has for entering in sales.

          Like everything in software this is all general. Some places will mix layers, subtract layers, add them. I would complain about the inconsistent nature of it all but without it I would be unemployed.

          • Pat12@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            Level 1a. Stuff slightly more complicated than a list of settings, usually for something like a VFD or a stepper motor controllers. Not as common.

            Level 2 PLCs, HMIs, and the black magic robotic stuff. Stand alone equipment. Like imagine a machine that can take something, heat it up, and give it to the next machine.

            Level 3: DCS and SCADA. Data control center and whatever SCADA stands for, I always forget. This is typically for integrating or at least data collection of multiple stand alone equipment for level 2.

            Level 4: the integration layer between Level 3 and whatever means the company has for entering in sales.

            Like everything in software this is all general. Some places will mix layers, subtract layers, add them. I would complain about the inconsistent nature of it all but without it I would be unemployed

            Is this specific software engineering languages? or is this electrical engineering or what kind of work is this?

            • afraid_of_zombies@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              1 year ago

              I am having problems understanding your questions. I generally operate on level 2 and we typically use graphics based languages when we implement scripting languages to do graphical languages. The two most common graphic languages are FBDs and Ladder-Logic. Both have a general form and vendor specific quirks.

              For scripting I tend towards Perl or Python, but I have seen other guys use different methods.

              Level 3 use pretty much the same tools. Level 4 I have in the passed used a modbus/tcp method but this isn’t something I can really say is typical. One guy I know used the python API to do it.

              • Pat12@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 year ago

                oh, thank you

                my background is not in engineering which explains my confusing questions

    • DontMakeMoreBabies@kbin.social
      link
      fedilink
      arrow-up
      46
      ·
      1 year ago

      I’m this person in my organization. I sent an email up the chain warning folks we were going to eventually run out of space about 2 years ago.

      Guess what just recently happened?

      ShockedPikachuFace.gif

      • IMongoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        1 year ago

        Sometimes that person is very silly though. We had a vendor call us saying we needed to clear our logs ASAP!!! due to their size. The log file was no joke, 20 years old. At the current rate, our disk would be full in another 20 years. We cleared it but like, calm down dude.

      • Mike D.@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Can’t you just add a few external USB drives? (heard this more than once at an NGO think tank.)

        • grabyourmotherskeys@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          1 year ago

          I mean I’ve worked at a hosting company that had a bunch of static sites running off an SSD connected by usb to the server so this did happen back in the day. I try not to think about those days.

          “What’s that? Your accounting front end that’s built in obsolete front page code on an Access database isn’t working again? It’s probably a file lock, I’ll restart IIS.”

    • Dojan@lemmy.world
      link
      fedilink
      English
      arrow-up
      18
      ·
      1 year ago

      Ballast!

      Just plonk a large file in the storage, make it relative to however much is normally used in the span of a work week or so. Then when shit hits the fan, delete the ballast and you’ll suddenly have bought a week to “find” and implement a solution. You’ll be hailed as a hero, rather than be the annoying doomer that just bothers people about technical stuff that’s irrelevant to the here and now.

      • Malfeasant@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Except then they’ll decide you fixed it, so nothing more needs to be done. I’ve seen this happen more than once.

  • Semi-Hemi-Demigod@kbin.social
    link
    fedilink
    arrow-up
    100
    arrow-down
    4
    ·
    1 year ago

    Sysadmin pro tip: Keep a 1-10GB file of random data named DELETEME on your data drives. Then if this happens you can get some quick breathing room to fix things.

    Also, set up alerts for disk space.

      • nfh@lemmy.world
        link
        fedilink
        English
        arrow-up
        17
        ·
        1 year ago

        Why not both? Alerting to find issues quickly, a bit of extra storage so you have more options available in case of an outage, and maybe some redundancy for good measure.

        • RupeThereItIs@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 year ago

          A system this critical is on a SAN, if you’re properly alerting adding a bit more storage space is a 5 minute task.

          It should also have a DR solution, yes.

      • looz@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        5
        ·
        1 year ago

        There’s cases where disk fills up quicker than one can reasonably react, even if alerts are in place. And sometimes culprit is something you can’t just go and kill.

        • afraid_of_zombies@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          1 year ago

          Had an issue like that a few years back. A stand alone device that was filling up quickly. The poorly designed device could only be flushed via USB sticks. I told them that they had to do it weekly. Guess what they didn’t do. Looking back I should have made it alarm and flash once a week on a timer.

      • ipkpjersi@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        A lot of companies have minimal alerting or no alerting at all. It’s kind of wild. I literally have better alerting in my home setup than many companies do lol

    • Lem453@lemmy.ca
      link
      fedilink
      English
      arrow-up
      20
      ·
      edit-2
      1 year ago

      Even better, cron job every 5 mins and if total remaining space falls to 5% auto delete the file and send a message to sys admin

      • Semi-Hemi-Demigod@kbin.social
        link
        fedilink
        arrow-up
        14
        ·
        1 year ago

        Sends a message and gets the services ready for potential shutdown. Or implements a rate limit to keep the service available but degraded.

      • bug@lemmy.one
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        At that point just set the limit a few gig higher and don’t have the decoy file at all

    • dx1@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      ·
      1 year ago

      The real pro tip is to segregate the core system and anything on your system that eats up disk space into separate partitions, along with alerting, log rotation, etc. And also to not have a single point of failure in general. Hard to say exact what went wrong w/ Toyota but they probably could have planned better for it in a general way.

    • Maximilious@kbin.social
      link
      fedilink
      arrow-up
      16
      arrow-down
      1
      ·
      edit-2
      1 year ago

      10GB is nothing in an enterprise datastore housing PBs of data. 10GB is nothing for my 80TB homelab!

    • z00s@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      1 year ago

      Or make the file a little larger and wait until you’re up for a promotion…

  • MoogleMaestro@kbin.social
    link
    fedilink
    arrow-up
    38
    arrow-down
    1
    ·
    1 year ago

    There’s some irony to every tech company modeling their pipeline off Toyota’s Kanban system…

    Only for Toyota to completely fuck up their tech by running out of disk space for their system to exist on. Looks like someone should have put “Buy more hard drives” to the board.

  • MechanicalJester@lemm.ee
    link
    fedilink
    English
    arrow-up
    28
    arrow-down
    1
    ·
    1 year ago

    I blame lean philosophy. Keeping spare parts and redundancy is expensive so definitely don’t do it…which is just rolling the dice until it comes up snake eyes and your plant shuts down.

    It’s the “save 5% yearly and stop trying to avoid a daily 5% chance of disaster”

    Over prepared is silly, but so is under prepared.

    They were under prepared.

    • I_Has_A_Hat@lemmy.ml
      link
      fedilink
      English
      arrow-up
      24
      ·
      1 year ago

      I work in a manufacturing company that was owned by the founder for 50 years until about 4 years ago when he retired. He disagreed with a lot of the ideas behind lean manufacturing so we had like 5 years worth of inventory sitting in our warehouse.

      When the new management came in, there was a lot of squawking about inefficiency, how wasteful it was to keep so much raw material on the shelf, and how we absolutely needed to sell it off or get rid of it.

      Then a funny little thing happened in 2020.

      Suddenly, we were the only company in our industry still churning out product. Other companies were calling us, desperate to buy our products or even just our raw material. We saw MASSIVE growth the next two years and came out of the pandemic better than ever. And it was mostly thanks to the old owners view that “Just In Time” manufacturing was BS.

      • daq@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        4
        ·
        1 year ago

        Cool story, but a once every 150 years pandemic is hardly a good reason to keep wasting money on storing stuff. A fire or a flood was much more likely to wipe it all out in 50 years.

        Even in your anecdote the owner never actually benefited from the extra costs.

        Depending on what you’re producing costs to maintain extra inventory of raw materials can be massive and for the company the size of Toyota, multiply that by million.

        • Kurroth@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          Even in your anecdote the owner never actually benefited from the extra costs.

          Imagine doing something or having a life/business philosophy that doesn’t exist for your own soul benefit, and exists maybe for the benefit of others.

      • afraid_of_zombies@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        1 year ago

        Oh man. I work for an OEM and have almost the same story except the CEO didn’t retire so our shelves were never bare. A large part of our sales last year was because our competition couldn’t source parts and we could. Also since they have a skeleton engineering crew they couldn’t figure out how to improvise (at least this is what our salespeople are claiming).

        2022 I spent 3 hours a day, every single working day, just making ECNs for parts out of stock.

        Lean JIT is such fucking crap. When a customer line goes down they are bleeding who knows how many millions a day. They want a solution right freaken now. So they call the OEM and we tell them that part that died we have right here on the shelf and we can overnight it.

        But you know how you really know that JIT Evengelical types are full of it? Talk to any of them now and they won’t say something like “look, normally it works but no one could have predicted this” they will double down on it and say it works but was implement incorrectly. It isnt “true” JIT.

        Many years ago I read a sentence that I use every single week of my life “all ideology can do is reassert itself endlessly”. When I hear crap like that, how X works if and only if it is true-X I know I am dealing with ideology. When I hear that X works under very specific situations and only gives specific results I know I am dealing with facts. Yes if the customer is never in a rush, there isn’t a world shaking event, and you are running out of capacity for storage JIT might be an answer.

  • blazera@kbin.social
    link
    fedilink
    arrow-up
    20
    ·
    1 year ago

    This is a fun read in the wake of learning about all the personal data car manufacturers have been collecting

  • R0cket_M00se@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    ·
    1 year ago

    Was this that full shutdown everyone thought was going to be malware?

    The worst malware of all, unsupervised junior sysadmins.

  • RFBurns@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    1 year ago

    Storage has never been cheaper.

    There’s going to be a seppuku session in somebody’s IT department.