You could easily process the audio and filter in on the device and just send the transcribed parts that match your filters for further processing. No need to send a constant audio stream. Eventually it would show up in the battery usage but things like the google song recognition also heavily use the mic and process audio and it doesn’t show up as heavy as you would expect.
Evidence of voice collection and data transmission. Just hook it up to wireshark and test. It’s been done and zero evidence produced. Not trivial to hide. Neither in traffic or battery use .
Why would they need to do it anyway? Far easier to just use the telemetry already there. Your phone knows more about you than you think already if you don’t use privacy respecting software. No need to use the microphone.
But if you know something none from the security field does, I’m all for seeing some evidence.
Nevermind the why (I’m not entirely convinced it’s being done), I want to know what exactly would be seen in network traffic.
Ok, you said “voice collection” which I’ll assume is audio recording and then uploading to some server. That’s an astonishingly bonkers and inefficient way of doing it. You run a very small model (using something like Tflite) that’s trained against a few hundred keyboards (brand names, products, or product category) and run it on the background of your service. Phones already do essentially this with assistant activation listening. Then once a few hours of listening, compress the plain text detection data (10 MB of plain text can be compressed to 1 MB) and then just upload the end result. And we wouldn’t be talking about megabytes, we’d be talking single digits kilobytes. An amount that wouldn’t even be a blip on wireshark, especially since phones are so exceedingly chatty nowadays. Have you actually tried to wireshark phone traffic? It’s just constant noise.
It’s entirely possible to do. But that doesn’t mean that it is being done.
It would cost trillions and half the battery life. Just because you dont understand something doesnt make you right. Your entire argument is shattered in the link I provided you earlier. Its not a few kb needed and if done locally a huge battery eater. Not to mention that the cost to have any use of it would exceed the entire value of the admarket.
there are plenty of people that can find shit in the noise on wireshark if there was anything like what you are suggesting.
Also there is a teapot in orbit around jupiter. Prove me wrong.
Lol. My dude, I’m a developer who specializes in AI.
It would cost trillions
I have no clue how you came to that number. I could (and partially have) whipped up a prototype in a few days.
half the battery life
Hardly. Does Google assistant half battery life? No, so why would this? Besides, you would just need to listen to the mic and record audio only if the sound is above a certain volume threshold. Then once every few hours batch process the audio. Then send the resulting text data (in the KBs) up to a server.
The average ad data that’s downloaded for in-app display is orders of magnitude larger than what would be uploaded.
there are plenty of people that can find shit in the noise on wireshark
How are they going to see data that’s encrypted and bundled with other innocuous data?
Litarally all your questions are answered in the link i pointed out twice now. Try it. Hey google doesnt take much 1k wake words a lot more… your math doesnt add up anywhere close to reality.
I don’t have any questions. This is something I know a lot about at a very technical level.
The difference between one wake word and one thousand is marginal at most. At the hardware level the mic is still listening non-stop, and the audio is still being processed. It *has" to do that otherwise it wouldn’t be able to look for even one word. And then from there it doesn’t matter if it’s one word or 10k. It’s still processing the audio data through a model.
And that’s the key part, it doesn’t matter if the model has one output or thousands, the data still bounces through each layer of the network. The processing requirements are exactly the same (assuming the exact same model).
Seems you don’t, and started your line with a question and continued to do so despite being provided with answers repeatedly . Is there some kink of roleplaying AI dev? You don’t really seem to have done your homework to do so.
Which would show up in network traffic , which it doesnt. There is no need for it.
You could easily process the audio and filter in on the device and just send the transcribed parts that match your filters for further processing. No need to send a constant audio stream. Eventually it would show up in the battery usage but things like the google song recognition also heavily use the mic and process audio and it doesn’t show up as heavy as you would expect.
Ok, real question: what exactly would show up in network traffic?
Evidence of voice collection and data transmission. Just hook it up to wireshark and test. It’s been done and zero evidence produced. Not trivial to hide. Neither in traffic or battery use .
Btw independent aricle from 2008 heres something a decade fresher https://www.androidauthority.com/your-phone-is-not-listening-to-you-884028/
Why would they need to do it anyway? Far easier to just use the telemetry already there. Your phone knows more about you than you think already if you don’t use privacy respecting software. No need to use the microphone. But if you know something none from the security field does, I’m all for seeing some evidence.
Nevermind the why (I’m not entirely convinced it’s being done), I want to know what exactly would be seen in network traffic.
Ok, you said “voice collection” which I’ll assume is audio recording and then uploading to some server. That’s an astonishingly bonkers and inefficient way of doing it. You run a very small model (using something like Tflite) that’s trained against a few hundred keyboards (brand names, products, or product category) and run it on the background of your service. Phones already do essentially this with assistant activation listening. Then once a few hours of listening, compress the plain text detection data (10 MB of plain text can be compressed to 1 MB) and then just upload the end result. And we wouldn’t be talking about megabytes, we’d be talking single digits kilobytes. An amount that wouldn’t even be a blip on wireshark, especially since phones are so exceedingly chatty nowadays. Have you actually tried to wireshark phone traffic? It’s just constant noise.
It’s entirely possible to do. But that doesn’t mean that it is being done.
It would cost trillions and half the battery life. Just because you dont understand something doesnt make you right. Your entire argument is shattered in the link I provided you earlier. Its not a few kb needed and if done locally a huge battery eater. Not to mention that the cost to have any use of it would exceed the entire value of the admarket.
there are plenty of people that can find shit in the noise on wireshark if there was anything like what you are suggesting.
Also there is a teapot in orbit around jupiter. Prove me wrong.
Lol. My dude, I’m a developer who specializes in AI.
I have no clue how you came to that number. I could (and partially have) whipped up a prototype in a few days.
Hardly. Does Google assistant half battery life? No, so why would this? Besides, you would just need to listen to the mic and record audio only if the sound is above a certain volume threshold. Then once every few hours batch process the audio. Then send the resulting text data (in the KBs) up to a server.
The average ad data that’s downloaded for in-app display is orders of magnitude larger than what would be uploaded.
How are they going to see data that’s encrypted and bundled with other innocuous data?
Litarally all your questions are answered in the link i pointed out twice now. Try it. Hey google doesnt take much 1k wake words a lot more… your math doesnt add up anywhere close to reality.
I don’t have any questions. This is something I know a lot about at a very technical level.
The difference between one wake word and one thousand is marginal at most. At the hardware level the mic is still listening non-stop, and the audio is still being processed. It *has" to do that otherwise it wouldn’t be able to look for even one word. And then from there it doesn’t matter if it’s one word or 10k. It’s still processing the audio data through a model.
And that’s the key part, it doesn’t matter if the model has one output or thousands, the data still bounces through each layer of the network. The processing requirements are exactly the same (assuming the exact same model).
This is the part you simply do not understand.
Seems you don’t, and started your line with a question and continued to do so despite being provided with answers repeatedly . Is there some kink of roleplaying AI dev? You don’t really seem to have done your homework to do so.