New accessibility feature coming to Firefox, an “AI powered” alt-text generator.


"Starting in Firefox 130, we will automatically generate an alt text and let the user validate it. So every time an image is added, we get an array of pixels we pass to the ML engine and a few seconds after, we get a string corresponding to a description of this image (see the code).

Our alt text generator is far from perfect, but we want to take an iterative approach and improve it in the open.

We are currently working on improving the image-to-text datasets and model with what we’ve described in this blog post…"

  • pr06lefs@lemmy.ml
    link
    fedilink
    arrow-up
    16
    ·
    4 months ago

    I like this approach of having a model locally and running it locally. I’ve been using the firefox website translator and its great. Handy and it doesn’t send my data to google. That I know of, ha.

    • InfiniWheel@lemmy.one
      link
      fedilink
      arrow-up
      1
      ·
      4 months ago

      The only issue for Firefox’s translator currently is the time it takes to load at first, or the fact you have to download each model first. Its not some monumental task, but it does have more friction than Google’s “automatically send the site you are browsing to our server”

  • jherazob@beehaw.org
    link
    fedilink
    English
    arrow-up
    8
    ·
    4 months ago

    Now i want this standalone in a commandline binary, take an image and give me a single phrase description (gut feeling says this already exists but depending on Teh Cloudz and OpenAI, not fully local on-device for non-GPU-powered computers)

      • jherazob@beehaw.org
        link
        fedilink
        English
        arrow-up
        4
        ·
        4 months ago

        So, it’s possible to build but no one has made it yet? Because i have negative interest in messing with that kinda tech, and would rather just “apt-get install whatever-image-describing-gizmo” so i wouldn’t be the one who does it

        • Swedneck@discuss.tchncs.de
          link
          fedilink
          arrow-up
          4
          ·
          4 months ago

          this is how i feel about basically all technology nowadays, it’s all so artificially limited by capitalism.

          nothing fucking progresses unless someone figures out a way to monetize it or an autistic furry decides to revolutionize things in a weekend because they were bored and inventing god was almost stimulating enough

        • The Doctor@beehaw.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 months ago

          Folks have made it - I think ollama was name-checked specifically because it’s on Github and in Homebrew and in some distros’ package repositories (it’s definitely in Arch’s). I think some folks (at least) aren’t talking about it because of the general hate-on folks have for LLMs these days.

          • jherazob@beehaw.org
            link
            fedilink
            English
            arrow-up
            2
            ·
            3 months ago

            I don’t want an LLM to chat with or whatever folks do with those things, i want a command i can just install, i call the binary on a terminal window with an image of some sort as a parameter, it returns a single phrase describing the image, on a typical office machine with no significant GPU and zero internet access.

            Right now i cannot do this as far as i know. Pointing me at some LLM and “Go build yourself something with that” is the direct opposite of what i stated that i desire. So, it doesn’t currently seem to exist, that’s why i stated that i wished somebody ripped it off the Firefox source and made it a standalone command.

      • Zworf@beehaw.org
        link
        fedilink
        arrow-up
        1
        ·
        3 months ago

        Yes I was just writing that, I would love to see more integrations that can talk against ollama.

  • ClassifiedPancake@discuss.tchncs.de
    link
    fedilink
    arrow-up
    8
    ·
    4 months ago

    When I used a similar feature in Ice Cubes (Mastodon app) it generated very detailed but ultimately useless text because it does not understand the point of the image and focuses on things that don’t matter. Could be better here but I doubt it. I prefer writing my own alt text but it’s better than nothing.

  • Kissaki@beehaw.org
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    3 months ago

    From your OP description:

    EDIT: the AI creates an initial description, which then receives crowdsourced additional context per-image to improve generated output. look for the “Example Output” heading in the article.

    That’s wrong. There is nothing crowd sourced. What you read in the article is that when you add an image in the PDF editor it can generate an alt text for the image, and you as a user validate and confirm it. That’s still local PDF editing though.

    The caching part is about the model dataset, which is static.

  • Zworf@beehaw.org
    link
    fedilink
    arrow-up
    4
    ·
    3 months ago

    One thing I’d love to see in Firefox is a way to offload the translation engine to my local ollama server. This way I can get much better translations but still have everything private.

  • Kissaki@beehaw.org
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    3 months ago

    So, planned experimentation and availabiltiy

    1. PDF editor when adding an image in Firefox 130
    2. PDF reading
    3. [hopefully] general web browsing

    Sounds like a good plan.


    Once quantized, these models can be under 200MB on disk, and run in a couple of seconds on a laptop – a big reduction compared to the gigabytes and resources an LLM requires.

    While a reasonable size for Laptop and desktop, the couple of seconds time could still be a bit of a hindrance. Nevertheless, a significant unblock for blind/text users.

    I wonder what it would mean for mobile. If it’s an optional accessibility feature, and with today’s smartphones storage space I think it can work well though.


    Running inference locally with small models offers many advantages:

    They list 5 positives about using local models. On a blog targeting developers, I would wish if not expect them to list the downsides and weighing of the two sides too. As it is, it’s promotional material, not honest, open, fully informing descriptions.

    While they go into technical details about the architecture and technical implementation, I think the negatives are noteworthy, and the weighing could be insightful for readers.


    So every time an image is added, we get an array of pixels we pass to the ML engine

    An array of pixels doesn’t make sense to me. Images can have different widths, so linear data with varying sectioning content would be awful for training.

    I have to assume this was a technical simplification or unintended wording mistake for the article.

    • pheet@sopuli.xyz
      link
      fedilink
      arrow-up
      1
      ·
      3 months ago

      Might be a significant issue if more applications adopt these kind of festures and can’t share the resources in a meaningful way.

    • IllNess@infosec.pub
      link
      fedilink
      arrow-up
      5
      ·
      4 months ago

      It is for websites. This is most useful for readers that don’t display images. The feature for websites should be added for version 130. I’m on Developer Edition and I am currently on 127. It will be implemented for PDFs in the future after that.

      • Kissaki@beehaw.org
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        3 months ago

        Where did you read this? The article says the opposite.

        will be available as part of Firefox’s built-in PDF editor

        Firefox is able to add an image in a PDF using our popular open source pdf.js library[…] Starting in Firefox 130, we will automatically generate an alt text and let the user validate it.

        See also my other quotes in this comment.

        will be available as part of Firefox’s built-in PDF editor

        • IllNess@infosec.pub
          link
          fedilink
          arrow-up
          1
          ·
          3 months ago

          What you quoted is for the feature to add in images to PDFs. It doesn’t work for existing PDFs with images already.

          In the future, we want to be able to provide an alt text for any existing image in PDFs, except images which just contain text (it’s usually the case for PDFs containing scanned books).

          That’s how I read it atleaat. I could be wrong.

    • Kissaki@beehaw.org
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 months ago

      They’re starting this as an experiment in their PDF editor, yes. They then want to extend to PDF reading, and then hope to extend to the general web browsing.

      will be available as part of Firefox’s built-in PDF editor

      Firefox is able to add an image in a PDF using our popular open source pdf.js library[…] Starting in Firefox 130, we will automatically generate an alt text and let the user validate it. So every time an image is added, […]

      In the future, we want to be able to provide an alt text for any existing image in PDFs, except images which just contain text (it’s usually the case for PDFs containing scanned books).

      Once the alt text feature in PDF.js has matured and proven to work well, we hope to make the feature available in general browsing for users with screen readers.

  • IllNess@infosec.pub
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 months ago

    But even for a simple static page there are certain types of information, like alternative text for images, that must be provided by the author to provide an understandable experience for people using assistive technology (as required by the spec)

    I wonder if this includes websites that use <figcaption> with alt emptied.

    • Kissaki@beehaw.org
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      3 months ago

      MDN figure and figcaption has no mention of changed img alt intentions. Which makes sense to me.

      figure does not invalidate or change how img is to be used. The caption may often not but can differ from the image description. If alt describes the image, figcaption captions it.

      What the fuck is Lemmy doing, breaking with HTML in code formatting?? Man it’s completely broken. I committed sth so it doesn’t remove the img lol.

      <figure>
        img src="party.jpg" alt="people partying" />
        <figcaption>Me and my mates</figcaption>
      </figure>
      
      • IllNess@infosec.pub
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        3 months ago

        Yes you can use both but I’ve seen some front end developers blank out alt altogether when they are using figcaption.

        I did not find this practice in MDN Web Docs but I found it in an other place:

        If you’re using an image that has a caption, it may not need alt text if the caption contains all of the relevant visual information.


        I was just wondering what Mozilla’s method was for finding these images and if they took other things in to consideration like decorative images.

  • leanleft@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 months ago

    There are way more companies who want to text-mine user content than there are blind people using the internet to read my content.