Bug Byte puzzle here - https://bit.ly/4bnlcb9 - and apply to Jane Street programs here - https://bit.ly/3JdtFBZ (episode sponsor). More info in full descript...
They are discussing a very specific approach and a paper that lays out the issues with pursuing this one specific type of generative AI. It’s not about AI in general. The headline is a bit click-baity.
While the paper demonstrated strong diminishing returns in adding more data to modern neural networks in terms of image classifers, the video host is explaining how the same may effect apply to any nureal network based system with modern transformers.
While there are technically methods of generative AI that don’t use a neural network, they haven’t made much progress in recent decades and arn’t what most people mean when they hear or say generative AI, and as such I would say the title is accurate enough for a video meant for a general audience, though “Is there a fundamental limit to modern neural networks” might be more technically correct.
I think most people underestimate how big of a deal it’s going to be when this tech is pervasive in things like search engines or digital assistants. There are many times when I can’t figure out the right combination of words to put into a search engine to find the results. ChatGPT is already my go to when I want to figure out a movie or song from some random combination of foggy memories. Imagine after 10 more years of cpu/gpu innovations, and chat applications that have actually been designed for information retrieval, how much that is going to transform how we interact with data and information.
Full disclosure, I didn’t watch the video. I just can’t imagine that that headline isn’t going to look silly in 30 years.
Imagine after 10 more years of cpu/gpu innovations, and chat applications that have actually been designed for information retrieval, how much that is going to transform how we interact with data and information.
LLMs are going to change how we interact with data and information, but not the way you think. The AI-generated spam will ruin the whole concept of internet search completely. Only information that we can trust is going to be human-curated.
There are diminishing returns in semiconductor photolitho. Moore scaling is long over, absolute real estate see WSI with Cerebras, DC costs and power envelope are all sending a clear message. Quantization is there, so you can go from digital multipliers to analog and go spiking networks, but transformers and Co have little power there.
Also, the kind of economy that can carry Gen AI as business model is not a given, long term.
Neuromorphic hardware is going to jump many orders of magnitude over classic hardware. When we get a RAM that can execute multiple layers in parallel at once, per clock tick, we’ll see whole AI ecosystems cooperating to get a solution in a fraction of the time a single modern NN would take.
Yes orders of magnitude, but not too many of them. The real estate of a 300 mm wafer is limited, the structure shrink is saturating and you can’t get too many layers. You still need a packet switched network on the wafer even if the rest is mostly analog. Perhaps spintronics can limit the power requirements too.
The orders of magnitude will come from the RAM running a whole layer at once in “a single clock”, without the need for a processor to execute any of it. It’s conceivable that multiple layers could be written/“programmed” into neuromorphic RAM, then a processor could just write the inputs, send an execute, move data from outputs to the next inputs, and repeat for all layers.
For example, an nVidia A100 goes up to 1,200 INT8 TOPS with 80GB of RAM at 1500MHz… but if the RAM could execute a neural network directly, that could raise it up to 80G*1.5G=120,000,000 INT8 TOPS, or 5 orders of magnitude.
A free running cellular automaton (CA) approach in hardware would work, but each cell would be a much souped up SRAM cell, the interactions would be all local and 2D. Considering Cerebras is 40 G SRAM on the 300 mm WSI and is about at the cooling limit I’m afraid you do not have 5 orders of magnitude. Perhaps reversible spintronics can help with the power draw, but you still have to splat a higher dimensional network so not just local interactions into a 2D array.
Current research points to memristors, which can work both as memory cells, and as weights in a n×m grid representing a fully connected n->m layer that executes in 1 clock. I forgot which company was showing prototypes since pre-covid… and now Google is so full of wannabes that I can’t seem to find it, oh well.
Cerebras is at the limit of SRAM, that’s true.
Spintronics could be the next step, but seems to be way less ready for production.
Higher dimensionality would be nice, but even at 2D, being able to push multiple processes at once, through multiple n×m layers, would already give those 5 orders of magnitude, at least for inference. Since training also involves an inference step, it would speed that too, just not as much.
Self-training would be the next step after that… I don’t think I’ve seen research in that regard, but maybe I’ve just missed it.
In 30 years, we’re going to look back at this headline like we look back at articles about the internet or smart phones being fads.
They are discussing a very specific approach and a paper that lays out the issues with pursuing this one specific type of generative AI. It’s not about AI in general. The headline is a bit click-baity.
While the paper demonstrated strong diminishing returns in adding more data to modern neural networks in terms of image classifers, the video host is explaining how the same may effect apply to any nureal network based system with modern transformers.
While there are technically methods of generative AI that don’t use a neural network, they haven’t made much progress in recent decades and arn’t what most people mean when they hear or say generative AI, and as such I would say the title is accurate enough for a video meant for a general audience, though “Is there a fundamental limit to modern neural networks” might be more technically correct.
That’s a bold prediction.
I think most people underestimate how big of a deal it’s going to be when this tech is pervasive in things like search engines or digital assistants. There are many times when I can’t figure out the right combination of words to put into a search engine to find the results. ChatGPT is already my go to when I want to figure out a movie or song from some random combination of foggy memories. Imagine after 10 more years of cpu/gpu innovations, and chat applications that have actually been designed for information retrieval, how much that is going to transform how we interact with data and information.
Full disclosure, I didn’t watch the video. I just can’t imagine that that headline isn’t going to look silly in 30 years.
LLMs are going to change how we interact with data and information, but not the way you think. The AI-generated spam will ruin the whole concept of internet search completely. Only information that we can trust is going to be human-curated.
You will need an LLM to tell that apart, so… 🤷
There are diminishing returns in semiconductor photolitho. Moore scaling is long over, absolute real estate see WSI with Cerebras, DC costs and power envelope are all sending a clear message. Quantization is there, so you can go from digital multipliers to analog and go spiking networks, but transformers and Co have little power there.
Also, the kind of economy that can carry Gen AI as business model is not a given, long term.
Neuromorphic hardware is going to jump many orders of magnitude over classic hardware. When we get a RAM that can execute multiple layers in parallel at once, per clock tick, we’ll see whole AI ecosystems cooperating to get a solution in a fraction of the time a single modern NN would take.
Yes orders of magnitude, but not too many of them. The real estate of a 300 mm wafer is limited, the structure shrink is saturating and you can’t get too many layers. You still need a packet switched network on the wafer even if the rest is mostly analog. Perhaps spintronics can limit the power requirements too.
The orders of magnitude will come from the RAM running a whole layer at once in “a single clock”, without the need for a processor to execute any of it. It’s conceivable that multiple layers could be written/“programmed” into neuromorphic RAM, then a processor could just write the inputs, send an execute, move data from outputs to the next inputs, and repeat for all layers.
For example, an nVidia A100 goes up to 1,200 INT8 TOPS with 80GB of RAM at 1500MHz… but if the RAM could execute a neural network directly, that could raise it up to 80G*1.5G=120,000,000 INT8 TOPS, or 5 orders of magnitude.
A free running cellular automaton (CA) approach in hardware would work, but each cell would be a much souped up SRAM cell, the interactions would be all local and 2D. Considering Cerebras is 40 G SRAM on the 300 mm WSI and is about at the cooling limit I’m afraid you do not have 5 orders of magnitude. Perhaps reversible spintronics can help with the power draw, but you still have to splat a higher dimensional network so not just local interactions into a 2D array.
Current research points to memristors, which can work both as memory cells, and as weights in a n×m grid representing a fully connected n->m layer that executes in 1 clock. I forgot which company was showing prototypes since pre-covid… and now Google is so full of wannabes that I can’t seem to find it, oh well.
Cerebras is at the limit of SRAM, that’s true.
Spintronics could be the next step, but seems to be way less ready for production.
Higher dimensionality would be nice, but even at 2D, being able to push multiple processes at once, through multiple n×m layers, would already give those 5 orders of magnitude, at least for inference. Since training also involves an inference step, it would speed that too, just not as much.
Self-training would be the next step after that… I don’t think I’ve seen research in that regard, but maybe I’ve just missed it.