I did a double-take last week when Micron Technology, one of the world’s largest memory chip makers, acquired artificial intelligence hardware and software startup Fwdnxt.

The move could be very interesting. If it bears fruit, Fwdnxt could bring Micron into direct competition with partners such as Intel and Nvidia, as Micron believes that memory and AI computing are converging into the same architecture.

But it’s no accident that one of the people at Micron in charge of this project is Steve Pawlowski, a former Intel chip architect who holds dozens of patents. Pawlowski is now vice president of advanced computing solutions at Micron.

When combined with Micron‘s memory chips, Fwdnxt (pronounced “forward next”) will enable Micron to explore deep learning AI solutions required for data analytics, particularly with internet of things and edge computing. Maybe it will make AI-based memory chips, or maybe memory chips that include AI.

VB Transform 2020 Online - July 15-17. Join leading AI executives: Register for the free livestream.

Boise-based Micron is doing this work, said Sanjay Mehrotra, CEO of Micron, because “the compute architectures of yesterday are not suitable for tomorrow … In the long run, we think compute is best done in memory.” I interviewed Pawlowski at the Micron Insights event last week in San Francisco.

Here’s an edited transcript of our interview.

Above: Steve Pawlowski of Micron and Abhishek Chaurasia of Fwdnxt.

Image Credit: Dean Takahashi

Steve Pawlowski: When I left Intel in 2014, I came to Micron and they said, “What do you want to do?” I said, “I’m convinced that the convergence of compute and memory is necessary for performance efficiency and lower latency. You’re a memory company. You have the technology. DRAM is going to be around a while. I’d like to work on that.” They said, “OK.”

I have a small team that’s focusing on finding problems where compute and memory — we can start testing the concept, start getting concepts into products, but not increase the cost. One of the things I knew at Intel, I’ll never forget the story — we used to have math coprocessors. 80287, 80387. We made an obscene amount of money on the 387. We had this bright idea that we could do it faster and better if we integrated the coprocessor inside the 486. We did, and all of a sudden we didn’t have enough of a footprint. The people who didn’t need it said, “You’re not charging me for that die area,” and the people that did need it said, “You’re going to pay me the same as everyone else because I’m a favored customer.” Effectively that whole business went to zero.

The key learning there is that you can’t add more complexity and cost and expect people to pay for it right out of the chute. Not until there’s a significant majority that gets real value out of it. What we’re focusing on is finding the key things where people can get value out of it today, and then just see if you can expand that bubble over time. I look at it as an eight to 10 year journey. At the end of those years I may look back and realize I wasted them. Or I could look back and say, “Wow, we may not have gotten here, but we did OK.”

VentureBeat: That sparks a lot of imagination as to what could result from this, but are there some specific things that you would drop hints about?

Pawlowski: The one thing, and you’ve heard a lot about it here, is AI at the edge. The reason that we focus there is there isn’t an incumbent programming model or an incumbent architecture where you’re fighting a market battle. Everybody’s fighting to get into the same stall, so to speak. There’s an opportunity to go do something there. People don’t look at you and say, “Micron is a memory company. Why are you talking about this?” They look at it like — we have this capability in an FPGA with our high-performance memory and an architecture that maps on an FPGA. We take care of all the abstractions so you don’t have to become a VHDL programmer. Would you be willing to start working on problems with your data sets?

The interesting thing is, I haven’t really had to go push that. We’ve been showing up at FPGA conferences and things like that. Mainly government agencies have come and said, “We have a problem here. We’d like to kick the tires on this a bit more.” The problem with the government is they get excited early, but if you ever want to do something it takes so long. Procurement cycles are long. Contracts are long, and everything else.

We decided to look at the general market. There was an automotive company that came and said, “We’re not level five, but we can certainly get level three, level four autonomous vehicles where we want to be able to use the network to tell us what’s going on. This looks intriguing. Are you willing to work with us?” A lot of people inside said, “Why are they interested in working with you?” It’s because I don’t come in and tell them what they need to do. I say, “Here’s what we’ve got. What can we do for you?” They say, “OK, you’re willing to listen to us. Here’s our problem.”

I learned that lesson, believe it or not, in 2005, when AMD was coming out with Opteron. We were still pushing seven-gig processors, 33-stage pipelines, and nobody was going there. We went to Wall Street, and it’s one of those moments where you want to crawl in a shell, because they really lit in. But I said, “Can you give us another chance? Can we sit down and understand our workloads, work with you, and I’ll take that back and we can build better products?” And we did.

We turned a lot of them — UBS, I remember an op-ed they wrote where they said, “You may not build the biggest chip or the best chip, but you came and understood my problem.” It was really understanding the customer and their problem and what you can do. If you do it and it doesn’t help them, hey, you learned something.

Above: Micron

Image Credit: Dean Takahashi

VentureBeat: As far as narrowing it down, is it coming up with a new kind of memory, or is it figuring out where the processing is done?

Pawlowski: The answer is yes. But it’s really understanding the dynamic. By the way, it depends on the model. I was just talking to someone down there about how some language models need 100 gigabytes for the parameters. When you see someone who says, “Hey, I’ve got two gigabytes, four gigabytes,” that’ll fit most models, but not all of them. The models are really evolving.

It depends on the latency of your solution, too. I don’t know if you saw the OHSU video down there where the lady had breast cancer. They need lots of data, because they want to put all the electron microscopy images together and build a 3D convolutional model, a 3D representation of the tumor. They don’t have enough time to go across, because they want actionable insight in a day or even an hour. The work we’re doing with CERN, we need the data now. We have to make decisions in microseconds. Is this something interesting or do we drop it on the floor?

Different solutions require different types of memory. What we’re learning is — the one thing I always liked about Intel, I knew what the instructions were from the program. I knew how they got executed in the machine and went out to the system. When I came to Micron, the only thing I saw were addresses and commands. Read/write command and an address. I had no understanding of — is this thing copying 15 different things to different elements here, or overwriting, or what? Having the company we’ve been working with and acquired in June — that architecture allows us to build these algorithms, run them, and see how the entire impact is.

Micron 7300 SSDs

Above: Micron 7300 SSDs

Image Credit: Micron

Our first goal is, what can we do in the memory storage to actually improve the time to a solution? We can always build higher bandwidth, but that may not necessarily be what’s going to get you there. Are there things you can do like scatter tensor arrays? If we could build a buffer that brought in a matrix and allowed us to shift the matrix over in one fell swoop, rather than just having the thing go and search for it, there’s potentially a big benefit there.

Eventually we’re also looking at — most of these are multiply and accumulate architectures, and very simple ones. They’re just replicated thousands of times. You can actually build a pretty good multiply and accumulate in a memory device once the transistors get just a little better. Eventually, can you take that architecture and then put it in a memory device itself? That’s the long-term vision.

What I want to do is, whatever we do, we build a programming infrastructure and a paradigm so that people don’t have to rewrite their code every time they go through a migration. In my mind, that was Intel’s great success. When we did 386, there was no 32-bit software. But it sure ran the 16-bit code really well. People bought it for that. You got a number of platforms out there, and then people said, “OK, now we’ll go and optimize for 32 bits.” When 486 came out six to eight years later, there was software to take advantage of it and it became a machine that never looked back.

Start with memory first, storage first, what we can do there. Then we’ll see what can actually migrate over time. The answer may be nothing. The answer could be everything. I think it’s somewhere in the middle. It just depends on where you move the needle.