• A_norny_mousse@feddit.org
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    1
    ·
    2 days ago

    As soon as you leave the big languages, esp. English, Wikipedia can be very problematic for all sorts of reasons.
    Mostly because of a lack of eyeballs.
    But it doesn’t end with merely badly written/generated content but also with narrative manipulation that - unlike in the English version - remains unchallenged.

    • vacuumflower@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      edit-2
      1 day ago

      Sorry, but English-speaking countries have basically invented “narrative manipulation”. For most of history it was normal that there are many competing narratives from interested parties on anything. But such sophistication at making one side’s narrative seem impartial, perpetually contested and self-healing has never been achieved before.

      It’s as if you paint a lake red, it’s expensive, and people may get used to it and even believe that’s kinda normal, but one can still see that it’s just one lake. If you paint the world oceans red, so that it rains red and mists red, that’s far more persuasive, and that’s what the “collective West” has achieved.

      To make a lake painted red seem normal, you need to prevent most of your population from looking at other lakes. But when you’ve managed to paint the ocean red, you don’t need to limit them at all. The fence and the punishment would hurt trust, but without them your and other people looking at the red oceans and rains will think they are also free.

      Despite being just one alliance of former and current colonizing powers on this planet.

      It’s very sad to live in an era of frustration where we can see that it can’t reform itself further in humanist direction, than it already has by about year 1988.

      Sort of like a planetwide revolutionary situation by Lenin, where the dominating powers can’t keep the order the old way (that persuasion still slowly dies), and the dominated can’t live the old way. But, as we know, revolutionary situations by Lenin generally don’t lead to what one would hope for.

      EDIT: Oh, I forgot. The point is that it’s actually nice sometimes to have alternative pages in smaller languages on niche subjects, explained better to my own taste. And in the bigger languages articles are sometimes removed for no good reason, say, Hotline\KDX have been butchered simply for being not popular anymore.

      • A_norny_mousse@feddit.org
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        1 day ago

        Sorry, but English-speaking countries have basically invented “narrative manipulation”.

        You have no idea how wrong you are. I could claim it was the roman catholic church and there’d probably still be older examples. More likely, no entity “basically invented” it.

        Nothing against you personally, but this is not the edgy take you think it is.

        Oh, I forgot. The point is that it’s actually nice sometimes to have alternative pages in smaller languages on niche subjects, explained better to my own taste.

        No, the point is that there are countries where people speak these languages and they want to read things in their own language. Sheesh.

        • vacuumflower@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 day ago

          “Basically invented” here meant the last huge leap, to having one cluster of narratives perceived as “normal and clean” almost globally. Of course literally not. The roman catholic church didn’t have such means.

  • chloroken@lemmy.ml
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    5
    ·
    1 day ago

    It’s profoundly chauvinistic to think that people who speak other languages don’t have the same depth of literary resource as English-speakers because Wikipedia has fewer users.

    Books. They’re called books. Every nation speaking every language has them.

    • HereIAm@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      ·
      1 day ago

      I understand you’re trying to be nice to minority languages, but if you write research papers you either limit your demographic to your own country, or you publish in English (I guess Spanish is pretty world wide as well). If you set out to read a new paper in your field, I doubt you’d pick up something in Mongolian.

      Even in Sweden I would write a serious paper in English, so that more of the world could read it. Yes, we have text books for our courses that are in Swedish, but i doubt there are many books covering LLMs being published currently for example.

      • chloroken@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        2
        ·
        edit-2
        1 day ago

        I’m not “trying to be nice to minority languages”, I’m directly pushing back against the chauvinistic idea that the English Wikipedia is so important that those without it are somehow inferior. There is no “doom spiral”.

        As for scientific papers, it’s called a translation. One can write academic literature in one’s native langaue and have it translated for more reach. That isnt the case with Wikipedia which is constantly being edited.

        • HereIAm@lemmy.world
          link
          fedilink
          English
          arrow-up
          8
          ·
          1 day ago

          No one is saying those who can’t access or reqd English wikipedia is inferior. The issue here is when what is on a non-english wikipedia article is misleading or flat out harmful (like the article says about growing crops), because of juvenile attempts at letting machine translations getting it very wrong. So what Greenland did was shut down its poorly translated and maintained wiki site instead of letting it fester with misinformation. And this issue compounding when LLMs scrape Wikipedia as a source to learn new languages.

        • Alaknár@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          1 day ago

          I’m not “trying to be nice to minority languages”, I’m directly pushing back against the chauvinistic idea that the English Wikipedia is so important that those without it are somehow inferior. There is no “doom spiral”.

          I think you missed the problem described here.

          The “doom spiral” is not because of English Wiki, it has nothing to do with anything.

          The problem described is that people who don’t know a “niche” language try to contribute to a niche Wiki by using machine translation/LLMs.

          As per the article:

          Virtually every single article had been published by people who did not actually speak the language. Wehr, who now teaches Greenlandic in Denmark, speculates that perhaps only one or two Greenlanders had ever contributed. But what worried him most was something else: Over time, he had noticed that a growing number of articles appeared to be copy-pasted into Wikipedia by people using machine translators. They were riddled with elementary mistakes—from grammatical blunders to meaningless words to more significant inaccuracies, like an entry that claimed Canada had only 41 inhabitants. Other pages sometimes contained random strings of letters spat out by machines that were unable to find suitable Greenlandic words to express themselves.

          Now, another problem is Model Collapse (or, well, a similar phenomenon in strictly in terms of language itself).

          We now have a bunch of “niche” languages’ Wikis containing such errors… that are being used to train machine translators and LLMs to handle these languages. This is contaminating their input data with errors and hallucinations, but since this is the training data, these LLMs consider everything in there as the truth, propagating the errors/hallucinations forward.

          I honestly have no clue where you’re getting anything chauvinistic here. The problem is imperfect technology being misused by irresponsible people.

          • AA5B@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            1 day ago

            Is it even getting misused? Spreading knowledge via machine translation where there are no human translators available, had to be better than not translating. As long as there is transparency so people can judge the results ……

            And ai training trusting everything it reads is a larger systemic issue, not limited to this niche.

            Perhaps part of the solution is machine readable citations. Maybe a search engine or ai could provide better results if it knew what was human generated vs machine generated. But even then you have huge gaps on one side with untrustworthy humans (like comedy) and on the other side with machine generated facts such as from a database

            • Alaknár@sopuli.xyz
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              1
              ·
              1 day ago

              Spreading knowledge via machine translation where there are no human translators available, had to be better than not translating

              Have you not read my entire comment…?

              One of the Greenlandic Wiki articles “claimed Canada had only 41 inhabitants”. What use is a text like that? In what world is learning that Canada has 41 inhabitants better than going to the English version of the article and translating it yourself?

              Perhaps part of the solution is machine readable citations

              The contents of the citations are already used for training, as long as they’re publicly available. That’s not the problem. The problem is that LLMs do not understand context well, they are not, well, intelligent.

              The “Chinese Room” thought experiment explains it best, I think: imagine you’re in a room with writing utensils and a manual. Every now and again a letter falls in to the room through a slit in the wall. Your task is to take the letter and use the manual to write a response. If you see such and such shape, you’re supposed to write this and that shape on the reply paper, etc. Once you’re done, you throw the letter out through the slit. This goes back and forth.

              To the person on the other side of the wall it seems like they’re having a conversation with someone fluent in Chinese whereas you’re just painting shapes based on what the manual tells you.

              LLMs don’t understand the prompts - they generate responses based on the probability of certain characters or words or sentences being next to each other when the prompt contains certain characters, words, and sentences. That’s all there is.

              There was a famous botched experiment where scientists where training an AI model to detect tumours. It got really accurate on the training data so they tested it on new cases gathered more recently. It gave a 100% certainty of a tumour being present if the photograph analysed had a yellow ruler on it, because most photos of tumours in the training data had that ruler for scale.

              But even then you have huge gaps on one side with untrustworthy humans (like comedy) and on the other side with machine generated facts such as from a database

              “Machine generated facts” are not facts, they’re just hallucinations and falsehoods. It is 100% better to NOT have them at all and have to resort to the English wiki, than have them and learn bullshit.

              Especially because, again, the contents of the Wikipedia are absolutely being used for training further LLM models. The more errors there are, the worse the models become eventually leading to a collapse of truth. We are already seeing this with whole “research” publications being generated, including “source” material invented on the spot, proving bogus results.

            • DoPeopleLookHere@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              1 day ago

              Is it even getting misused? Spreading knowledge via machine translation where there are no human translators available, had to be better than not translating. As long as there is transparency so people can judge the results

              Assumes the AI is accurate, which is debatable

              Also how do you do citations on a translation?

              Its an interpretation, not a fact

              • AA5B@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                1 day ago

                Sure there are limitations. The point still stands: an imperfect machine translation is better than no translation, as long as people understand it is.

                Can we afford to allow a high bad deprive people of knowledge just because of the language they speak?

                The article complains about the affect on languages of poor machine translations, but the affect of no translations is worse. Yes those Greenlanders should be able to read all of Wikipedia without learning English and even if the project has no human translators

                • Euphoma@lemmy.ml
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  ·
                  1 day ago

                  Wikipedia already has a button where you can go to another language’s version of that page where you can then machine translate it yourself.

                • DoPeopleLookHere@sh.itjust.works
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  23 hours ago

                  Yes those Greenlanders should be able to read all of Wikipedia without learning English and even if the project has no human translators

                  Again, your assuming a high level of accuracy from these tools. If LLM garbage leaves it unreadable, is that actually better?

  • Bloefz@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    9
    ·
    edit-2
    1 day ago

    Does it really matter? I think the extreme amount of languages in the world right now is not helping us communicate. I don’t view language as a cultural heritage thing, just a communication protocol. And I have moved around a lot in the world, it’s very difficult to be constantly adapting to different languages. That causes a societal integration barrier for me.

    I think if we had a universal language (note that it wouldn’t have to be English) we would be able to understand each other better and have less wars.

    PS: I’m not advocating to ban languages or something, just to have a universal one. A bit like what Esperanto tried to achieve. Mutual language means more mutual understanding and thus less “us vs them” underbelly feelings that the fascists thrive on.

      • Bloefz@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        edit-2
        1 day ago

        Yeah I’m just not really wed to any language. I guess it is also because I have moved around so much. I’m from Holland but I don’t consider myself a Dutch person, more like a citizen of the world. I’ve become too different to fit in in my home country (also because it’s become an extreme-right cesspool lately 😢 ). I’ve spent about half my life elsewhere. And the places I’ve lived where I spoke the languages I fared noticeably better.

        Don’t forget that a lot of today’s problems center around not understanding each other. The hatred of immigrants for example.

        But I know a lot of people do view language as a cultural thing, it’s just my point of view.

    • TankovayaDiviziya@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      16 hours ago

      Languages have their own quirks and characters, representative of the people’s cultural values and history, and express ideas not even present in other cultures. As many languages have to be preserved as possible for these reasons.