CTO专访：SambaNova公司Kunle Olukotun（中英双语对照）

EDN电子技术设计 2021-12-24 15:26

Aspencore_EETOA_Weekly_Recap_102821

On the Weekly Briefing podcast: AI is the biggest story in the electronics industry, and by several measures Sambanova ranks among the biggest AI companies. An exclusive interview with SambaNova chief technologist Kunle Olukotun, who talks about what AI can do, AI supercomputers, and something called “dataflow threads.”

在本期的每周简报播客上：人工智能是电子行业最重要的故事，从几项衡量标准来看，Sambanova跻身最大的人工智能公司之列。在对SambaNova首席技术专家Kunle Olukotun进行的独家采访中，他谈到了人工智能的用途、人工智能超级计算机以及所谓的“数据流线程”。

[FULL TRANSCRIPT AVAILABLE BELOW]

[完整的文字记录如下]

BRIAN SANTO: I’m Brian Santo, EE Times editor-in-chief. You’re listening to EE Times On Air.

BRIAN SANTO: 我是EE Times主编Brian Santo。您正在收听EE Times On Air。

Developing the technology for artificial intelligence is a dynamic field in and of itself, but the whole point of AI is to enable innovation in just about every other electronics application out there.

开发人工智能技术本身就是一个充满活力的领域，但人工智能的重点是在几乎所有其他电子应用中实现创新。

The potential markets for AI are about as limitless as the prospects for any technology seen in recent years, and that has gotten investors extraordinarily excited.

人工智能的潜在市场与近年来所看到的任何技术的前景一样无限，这让投资者异常兴奋。

Every year EE Times publishes a rundown of some of the hottest startups in the electronics industry. It’s called the Silicon 100. And every year in the past few years, there has been a boggling number of AI startups, and most of them are attracting tens of millions of dollars in funding.

EE Times每年都会发布一些电子行业最热门的初创公司的概要，也即所谓的Silicon 100。在过去的几年里，每年都有数量惊人的人工智能初创公司，其中大多数都吸引了数千万美元的资金。

A handful of AI startups are attracting not tens of millions of dollars, but hundreds of millions. A smaller number still have attracted over one billion dollars in investor cash. SambaNova is one of them. The company is now considered to be worth more than $5 billion overall, which makes it more valuable than many of the companies in the Fortune 500. It’s an interesting position to be in for a startup that emerged from stealth mode only two years ago.

少数人工智能初创公司吸引的不是数千万美元，而是数亿美元。更少数甚至吸引了超过10亿美元的投资者现金。SambaNova就是其中之一。该公司现在的整体估值超过50亿美元，这使它比财富500强中的许多公司都更有价值。对于一家仅在两年前从隐身模式中脱颖而出的初创公司来说，这是一个很引人关注的位置。

Venture funding in no way guarantees success, but pulling in a billion dollars at the very least establishes that a lot of investors think SambaNova is headed for success.

风险投资并不能保证成功，但SambaNova至少能筹集到10亿美元，这表明许多投资者都看好SambaNova正在走向成功。

The company’s front man is hyperkinetic CEO Rodrigo Liang. But SambaNova is going to succeed or fail based on its merits, and the guy most responsible for those merits is chief technologist Kunle Olukotun.

该公司的负责人是极度活跃的首席执行官Rodrigo Liang。但SambaNova的成败取决于其优点，而对这些优点负有最大责任的是其首席技术专家Kunle Olukotun。

Olukotun is credited with designing the first general-purpose multi-core CPU, and he did pioneering work on single-chip multiprocessor and multi-threaded processor design.

Olukotun设计了第一个通用多核CPU，他在单芯片多处理器和多线程处理器设计方面做了开创性工作。

I started by asking him to describe what SambaNova is all about.

我首先请他描述下SambaNova的全部内容。

KUNLE OLUKOTUN: Well, first of all, SambaNova over is focused on data center predominantly, so we’re focusing on both training and serving models from the data center. And the whole goal is, of course, to provide the capabilities to be able to train very, very large, accurate models. If you look at the current landscape of computing capabilities, mostly dominated by GPUs, what you need is many, many GPUs because of the limited amount of memory that each of the GPUs can have.

KUNLE OLUKOTUN: 首先，SambaNova主要关注数据中心，所以我们关注来自数据中心的训练和服务模型。当然，整个目标是能够训练非常大又准确的模型。如果您去看当前主要由GPU主导的计算能力格局，这当中需要很多很多的GPU，因为每个GPU可以拥有的内存量有限。

And so, what SambaNova brings to the table is the ability one or two or a quarter rack of capability to be able to provide terabytes of memory. And so that allows you to build huge models that can serve any of the particular industrial verticals or commercial verticals that are of interest.

因此，SambaNova带来的是只用一两个或四分之一个机架就能提供TB级的内存。因此，您可以构建巨大的模型，而为任何感兴趣的特定工业垂直或商业垂直市场提供服务。

So, for instance, huge natural language processing models for the financial sector or for developing chatbots and customer services, voice-based commerce. Also, natural language processing, finds use in cancer research also.

例如用于金融部门或用于开发聊天机器人和语音客户服务的巨大自然语言处理模型。此外，自然语言处理也可用于癌症研究。

Or huge vision models that we call true resolution that allow you to do medical images without reducing the resolution to make the image more blurry, so that you can fit it into the memory requirements of conventional systems.

或者称为真实分辨率的巨大视觉模型，它可以让您在不降低分辨率而使图像变得更加模糊的情况下进行医学图像处理，这样您就可以使其适应传统系统的内存要求。

Or the ability to, conversely, the other thing that people do to manage limited memory is chop the maybe 20k by 20k image into patches, and then each of these patches can be processed. But you process the patches independently, and you potentially lose important features across the boundaries. And so by being able to process the 20k by 20k image and create models that can take the whole resolution of the image, you can both get the fine detail of, potentially, cancel whatever you’re trying try to image. And you can also get the features that cross a large swath of the image and use those to improve the accuracy of whatever you’re trying to do.

或者相反，人们为管理有限内存而做的另一件事是将可能20k x 20k的图像切成块，然后可以处理这些块中的每一个。但是您独立处理这些块，您可能会丢失跨边界的重要特征。因此，能够处理20k x 20k的图像，然后创建可以采用整个图像分辨率的模型，那么您就可以获得精细的细节，可能会取消您所尝试成像的任何内容。您还可以获得跨越大片图像的特征，并使用这些特征来提高您所尝试做的任何事情的准确性。

So in medical imaging, astronomy, scientific imaging, X ray imaging, all of these things are generating huge images that cannot be easily handled with today’s capabilities. And we are able to do that with the SambaNova systems.

因此，在医学成像、天文学、科学成像、X射线成像中，所有这些都产生了无法以当今的能力轻松处理的巨大图像。我们能够通过SambaNova系统做到这一点。

So many kinds of use cases. And a third classic use cases are large recommendation systems. What you want to do is accurately capture the personalities of the potential customer that you’re trying to recommend for. And that comes in the form of what are called huge embedding tables, which capture this kind of detail. And these tables run into the terabytes.

还有很多种类的用例。第三个经典用例是大型推荐系统。您想要准确捕捉您所尝试推荐的潜在客户的个性。这以所谓的巨大的词表（embedding table）形式出现，它捕获了这种细节。这些词表达到了TB。

And again, you want to be able to handle those large embedding tables because the larger the embedding table, the more accurate the model, the better recommendation you’re able to give.

同样，您希望能够处理那些大型词表，因为词表越大，模型越准确，您能够提供的推荐就越好。

And so all these capabilities are provided by the SambaNova system. And one of the key features that we’ve focused on in the development of the system is to be able to both train these large models (and these large models, they’re going to give you state-of-the art accuracy), but also serve them.

所有这些功能SambaNova系统都可以提供。我们在系统开发中所关注的一个关键功能，是能够训练这些大型模型（这些大型模型将为您提供最先进的准确性），但也为它们服务。

And this has two great benefits. One is, you serve what you train, right? So you don’t have to requalify the model on a different serving environment. So typically, today, what happens is that you train on GPUs, because GPUs are very good at large batch training, right? So when you talk about training, you want many examples in the batch, because that’s the only way to fill up the capabilities of the GPU so that you make it efficient.

这有两个很大的好处。一是，您服务于您所训练的，因此您不必在不同的服务环境中重新验证模型。因此，今天通常发生的情况是在GPU上进行训练，因为GPU非常擅长大批量训练，对吗？因此，当您谈到训练时，您希望一批数据中有许多例子，因为这是使GPU能力满载以使其高效的唯一方法。

But if you make the batches too small, the GPU becomes very inefficient. And so what people do is they say, GPUs are no good at doing small batches. And so we’ll move to CPUs, because when it comes to inference, you get one request at a time. The requests don’t come in batches, or if you wait until you’ve got a batch of requests, then of course, you’ve delayed the requests that came in first. So your latency is going to get worse.

但是如果您让批次太小，GPU就会变得非常低效。所以人们所做的，就是他们说GPU不擅长做小批量。因此我们将转向CPU，因为在进行推理时，您一次只会收到一个请求。请求不是成批出现的，或者如果您等到收到一批请求，那么当然，您已经延迟了最先到达的请求。所以您的延迟会变得更糟。

And so what you want is the capability to both do large batch training very efficiently, but also to single-batch inference very efficiently. And so we can do that with the SambaNova systems. So think of SambaNova systems as this capability of doing training and inference very efficiently. And then the real full circle: Once you can do training and inference on the same platform, you can dynamically switch between them.

所以您想要的是既能非常有效地进行大批量训练，又能非常有效地进行单批次推理。所以我们可以用SambaNova系统做到这一点。因此，可以将SambaNova系统视为能够非常有效地进行训练和推理。然后是真正的闭环：一旦您可以在同一平台上进行训练和推理，您就可以在它们之间动态切换。

So there’s a new class of models being developed that allow you to what’s called continuous training, where the distribution of the data that you get changes, and you want to adapt. So an example would be, suppose that you had a camera that was looking at the intersection somewhere in the Northeast. That scene at the intersection is going to look very different during the summer months than it is during the winter months. And you want to be able to shift your model to accommodate snow or ice or rain, as opposed to maybe when it was trained on a nice sunny day.

因此，一类新的模型正在被开发，让您可以进行所谓的持续训练，在这种情况下，您所获得的数据分布会发生变化，因此您想要适应。举个例子，假设您有一个摄像头正在观察东北某处的十字路口。在夏季的几个月里，十字路口的那个场景看起来会和冬季的几个月大不相同。因此，您希望能够改变您的模型以适应雪、冰或雨，而不是只在晴天训练的情况。

And so it turns out that you could try and train the Uber model, but it turns out to be much better if you can have a model that adapts. But what does that require? It means that you need to be able to do training and inference very efficiently. So this all points to what we’re calling convergence. So you set very low power inference.

所以事实证明您可以尝试训练优步模型，但如果您有一个适应的模型，结果会好得多。但这需要什么？这意味着您需要能够非常有效地进行训练和推理。所以这一切都指向我们所说的收敛。所以您设置了非常低功耗的推理。

Well, you can imagine if you’ve got some mobile devices that need to do inference that you really care about low power, but in many cases, what you want to do is, you’ve got some Edge compute capability in a server, which is closer to the edge, or you actually are going all the way back to some data center. But in either case, what you want is the ability to do both training and inference at the same time.

您可以想象如果您有一些需要进行推理的移动设备，您真的很关心低功耗，但在很多情况下，您想要做的是，您在服务器中拥有一些边缘计算能力，哪个更接近边缘，或者您实际上要一直回到某个数据中心。但无论哪种情况，您想要的是能同时进行训练和推理。

And so that’s the way we think of SambaNova, as this capability that provides compute capability, but also more.这就是我们对SambaNova的想法，因为这种能力不仅提供计算能力，而且还提供更多能力。

BRIAN SANTO: Well, is the ability to do both at simultaneously rare or unique? And where this question comes from is, you look at the largest GPU vendors and you see them getting into CPUs and doing a lot of research into figuring out how to move data back and forth rapidly and efficiently between processing and memory. The same thing with some of the largest CPU companies getting into GPUs and also experimenting, trying to figure out ways to handle processing and memory and ease that exchange. And they’re making advances.

BRIAN SANTO: 嗯，同时做到这两点的能力是罕见的还是独一无二的？这个问题的出处是，您看看最大的GPU供应商，您会看到他们进入CPU并做了大量研究来弄清楚如何在处理和内存之间快速有效地来回移动数据。一些最大的CPU公司进入GPU并进行试验，试图找出应对处理和内存的方法，并简化这种交换。他们正在取得进展。

The question I think, for someone who wanted to do a AI but doesn’t know much about it would be: What’s the distinction? Is someone built for AI going to do it better or will the ability to do both inference and training simultaneously be interesting for my application? Or — and perhaps and — does cost become a factor in your evaluation of whether to go with traditional GPU-CPU computing versus something purpose built for a specific problem space?

我认为，对于想要做人工智能但对其知之甚少的人来说，问题是：有什么区别？是否有人构建AI会做得更好，或者能同时进行推理和训练对我的应用是否有趣？或者——或许还有——在您评估是否采用传统的GPU-CPU计算还是为特定问题空间构建的某种东西，成本是否成为一个因素？

KUNLE OLUKOTUN: Yeah, well, I think all these issues are important. Fundamentally, when thinking about AI models and developing AI software, it’s all done using frameworks. So using either pytorch or TensorFlow. Pytorch seems to be gaining the most traction recently, both among the researchers and into industry. But nonetheless, both of these frameworks develop these machine learning models as what we call data flow graphs.

KUNLE OLUKOTUN: 是的，我认为所有这些问题都很重要。从根本上说，在考虑AI模型和开发AI软件时，都是使用框架来完成的。所以使用pytorch或TensorFlow。Pytorch最近似乎获得了最大的关注，无论是在研究人员中还是在行业中。但是尽管如此，这两个框架都将这些机器学习模型开发为我们所说的数据流图。

And so the data flow graphs represent the model in terms of kernels. And the kernels could be matrix multiply, they could be convolution, they could be pooling, they could be whatever. And then the data that moves between these kernels are the tensor data that represent the model itself.

因此，数据流图以内核的形式表示模型。内核可以是矩阵乘法，可以是卷积，可以是池化，可以是任何东西。然后在这些内核之间移动的数据是代表模型本身的张量数据。

And so the focus that we have in the design of SambaNova is to take our cue from the data flow expressed in these models, and very efficiently map that in space and time to the architecture. And so what that does for you is, the application program that gave you the roadmap, but that roadmap is ignored by most conventional architectures, CPUs and GPUs. We say, You’ve given us information here, let’s use it to optimize the data flow and match the data flow as represented in the application to the architecture.

因此，我们在SambaNova设计中的重点是从这些模型中表达的数据流中获取线索，并非常有效地将其在空间和时间上映射到架构。因此，它为您所做的是为您提供路线图的应用程序，但大多数传统架构、CPU和GPU都忽略了该路线图。我们说，您已经在这里提供了信息，让我们使用它来优化数据流并将应用程序中表示的数据流与架构相匹配。

So we have what we call a reconfigurable dataflow architecture that can minimize the bandwidth that is generated, especially off chip. And it maximizes the use of the compute because it gives you the ability to what’s called do pipelining in space on the chip, such that you can keep all the units of the capabilities of the chip fully utilized.

因此，我们拥有所谓的可重构数据流架构，可以最大限度地减少生成的带宽，尤其是片外带宽。它最大限度地利用了计算，因为它使您能够在芯片上的空间中进行所谓的流水线操作，这样您就可以充分利用芯片的所有功能单元。

So if you thinking about the key elements of any compute environment, it’s about what is your compute capability, what is your memory bandwidth capability, both on chip and off chip? What is your communication bandwidth to both on chip and to other units? And then how effectively do you use that?

因此，如果您考虑任何计算环境的关键要素，那么您的计算能力是什么，您的内存带宽能力是多少，无论是片上还是片外？您与片上和其他单元的通信带宽是多少？然后您如何有效地使用它？

And so, with the ability to both control the compute and the communication, driven by the data flow graph, we have the capability with the Samba flow software development environment, which includes very sophisticated compilers that look at the data flow graph and match it to the capabilities of the architecture, because the architecture is fundamentally reconfigurable and so can be reconfigured to match the properties of the application that you’re running.

因此，通过数据流图驱动的控制计算和通信的能力，我们拥有Samba Flow软件开发环境的能力，其中包括非常复杂的编译器，可以查看数据流图并将其与架构的功能匹配，因为架构从根本上是可重新配置的，所以借此就匹配您正在运行的应用程序的属性。

BRIAN SANTO: Wild. What does this mean for the application developer? One of the traditional attractions of working with a very popular CPU or GPU is that you’ve got all those tools already there. And a lot of people are either very familiar with them, and if they’re not familiar, there are all sorts of ways where they can learn. In AI, is that a common experience with SambaNova? Is it different for other AI companies?

BRIAN SANTO: 很好。这对应用程序开发人员意味着什么？使用非常流行的CPU或GPU的传统吸引力之一是您已经拥有所有这些工具。而且很多人要么非常熟悉，要么不熟悉，有各种各样的方法可以学习。在AI中，这是SambaNova的常见体验吗？其他AI公司有什么不同吗？

KUNLE OLUKOTUN: Yeah, well, I think there are different kinds of developers. There are AI application developers, and there are system developers that like to delve into the details and optimize performance at a low level and are conversant with low level programming libraries in C and may even be willing to write CUDA for GPUs.

KUNLE OLUKOTUN: 是的，我认为有不同类型的开发人员。有AI应用程序开发者；也有系统开发者，他们喜欢钻研细节，在底层优化性能，熟悉C语言的低级编程库，甚至可能愿意为GPU编写CUDA。

BRIAN SANTO: For huge applications like finance, there’s incentive to do that, right?

对于像金融这样的大型应用程序，有这样做的动力，对吗？

KUNLE OLUKOTUN: Right, right. But at the end of the day, you ask anybody who is trying to achieve performance, and they’d like to achieve that performance in the simplest and easiest way possible. Most people are forced to write CUDA. And once they write CUDA, now they’re locked into Nvidia hardware. But that’s not what most AI developers want to do.

KUNLE OLUKOTUN: 对，对。但归根结底，您问任何试图实现性能的人，他们都希望以最简单、最容易的方式实现这种性能。大多数人被迫编写CUDA。一旦他们编写了CUDA，现在他们就被锁定在Nvidia硬件中。但这不是大多数AI开发人员想要做的。

Most AI developers want to develop their application in Python using these frameworks. And in fact, the overwhelming majority of the models and the applications are developed at this high level in Python. And so what we want to do at the end of the day is to make it easy to use our capabilities. And that is enabled by having a software stack that we call Sambaflow that takes those models; analyzes the computation, communication, and memory usage; and then optimizes that for both memory locality and for compute, such that you map the requirements of the application to this architecture and then you reconfigure the architecture to get the best performance. And all of that happens automatically.

大多数AI开发人员希望使用这些框架在Python中开发他们的应用程序。事实上，绝大多数模型和应用程序都是用Python开发的。因此，我们最终想要做的是让我们的功能易于使用。这是通过拥有一个我们称为Samba Flow的软件堆栈来实现的——该堆栈会采用这些模型，分析计算、通信和内存使用情况，然后针对内存局部性和计算进行优化，这样您就可以将应用程序的要求映射到此架构，然后重新配置架构以获得最佳性能。而这一切都是自动发生的。

So you as the developer. Now, if you really, really care about low level performance optimization, and you really have the capabilities to do that, which is few performance-oriented engineers. You can delve… we can give you a low level programming model in C that allows you to try to actually program at the data flow level, but that’s for 1% of the developers. The other 99% are going to be quite happy with using the frameworks because the compiler can deliver performance.

所以您作为开发者，现在，如果您真的非常关心底层性能优化，并且您真的有能力做到这一点，那么以性能为导向的工程师很少。您可以深入研究……我们可以为您提供一个C语言的底层编程模型，让您可以尝试在数据流级别进行实际编程，但这仅适用于1%的开发人员。其他99%的人会对使用这些框架感到非常满意，因为编译器可以提供性能。

So the reason people had to write CUDA is because they couldn’t get the performance they needed from the existing library kernels. But the nice thing about data flow is, it gives you what’s called automatic fusion. So because you get to place two kernels on the same chip at the same time and have them communicate efficiently between the two, you get fusion. So if you look at what most CUDA programmers are trying to achieve when they write custom CUDA kernels, what they’re doing is, they are actually doing this fusion manually. They are taking two formerly separate kernels that didn’t communicate very efficiently and putting them together in a single kernel so that communication is efficient.

所以人们不得不编写CUDA的原因是他们无法从现有的库内核中获得所需的性能。但数据流的好处在于，它为您提供了所谓的自动融合。因此，因为您可以将两个内核同时放置在同一个芯片上，并让它们在两者之间进行有效通信，您就可以实现融合。所以如果您看看大多数CUDA程序员在编写自定义CUDA内核时试图实现的目标，他们所做的是，他们实际上是手动进行这种融合。他们采用了两个以前相互独立的内核，这些内核不能非常有效地进行通信，因此要将它们放在一个内核中，以便有效地进行通信。

And what we do with data flow is, we do that automatically. So you get to define kernels however you like, and then we will make sure that they run efficiently together.

我们对数据流所做的是，我们自动地做到这一点。因此，您可以随意定义内核，然后我们将确保它们一起高效运行。

BRIAN SANTO: Ah! Okay. All right. And my understanding is that some of Sambaflow is that that type of tool is still fairly uncommon in the AI area.

BRIAN SANTO:啊！好的。我的理解是，Samba Flow的一些内容是，这种类型的工具在AI领域仍然相当不常见。

KUNLE OLUKOTUN: Yeah, doesn’t exist. So what we’re taking is a whole graph analysis approach that starts at the framework level and analyzes the whole graph.

KUNLE OLUKOTUN: 是的，不存在。所以我们采取的是一种全图分析方法，它从框架级别开始并分析整个图。

Now, one of the reasons that, of course, the conventional architectures don’t bother with this whole graph approach is because they’re only executing things a kernel of time, and then they’re shuttling all the data off chip to HBM and then pulling it back for the next kernel. We don’t want to do that. That’s a waste of bandwidth, and it doesn’t give you the fusion advantages that we just talked about.

现在，当然，传统架构不关心这种全图方法的原因之一是它们一次只执行一个内核，然后将所有数据从芯片传输到HBM和然后将其拉回下一个内核。我们不想那样做。那是对带宽的浪费，并没有给您我们刚才讲的融合优势。

Ideally what you do is, you want both of those kernels to run concurrently. And then you also get pipelining. So, while the second kernel is working on the earlier piece of data, the first kernel gets the next piece and so on. And you get this pipeline of computation that happens on the chip. And then of course, communication is very streamlined and very efficient.

理想情况下，您希望这两个内核同时运行。然后您也会得到流水线。因此，当第二个内核处理较早的数据时，第一个内核获取下一个数据，依此类推。您会在芯片上得到这个计算流水线。然后，当然，沟通非常流畅且非常有效。

I remember being at Stanford for many years. And I once was listening to John Hennessy, who, of course, is a former president of Stanford and a great computer architect describe pipelining as the best thing since sliced bread for computer architecture. As the idea that, with very, very little amount of extra resources, you get this tremendous performance improvement. You know, it was what drove risk in the early days; it’s what drives the design of modern CPUs. We use it in this dynamic fashion to accelerate AI applications.

我记得在斯坦福大学呆了很多年。我曾经听John Hennessy——他当然是斯坦福大学的前任校长，也是一位伟大的计算机架构师——将流水线描述为自计算机架构切片以来最好的事情。就像这样的想法，使用非常非常少的额外资源，您可以获得巨大的性能提升。您知道，这是早期推动RISC的原因；它推动了现代CPU的设计。我们以这种动态方式使用它来加速AI应用程序。

BRIAN SANTO: So how involved do you get with applications of your customers? Ultimately my question is, has anybody come to you with something you either didn’t expect or found novel?

BRIAN SANTO: 那么您对客户应用的参与程度如何？最后我的问题是，有没有人带着您没有预料到或发现了新奇的东西来找您？

KUNLE OLUKOTUN: Yeah, I mean, we’re working with the National Labs, with Rick Stevens from Argonne National Lab, and they are developing models for modeling COVID. And they’ve got all sorts of imaging models. Initially, we take these models, and we may have to optimize a few things in the software to accommodate them. But they’re showing that they can train models faster than GPUs. And they can get scalable performance on our data scale systems that surpasses anything that they’ve seen.

KUNLE OLUKOTUN: 是的，我的意思是，我们正在与美国国家实验室以及来自阿贡国家实验室的Rick Stevens合作，他们正在开发用于对COVID建模的模型。他们有各种各样的成像模型。最初，我们采用这些模型，我们可能需要优化软件中的一些内容以对其进行适应。但是他们表明他们可以比GPU更快地训练模型。他们可以在我们的数据规模系统上获得超越他们所见过的任何东西的可扩展性能。

So there is this conversation that we have with customers, especially with very sophisticated customers at the National Labs, who may want to program at the data flow level, or may have interesting models that can help drive the development of the Sambaflow software that we have been working on.

因此，我们与客户进行了对话，尤其是与美国国家实验室非常成熟的客户对话，他们可能想要在数据流级别进行编程，或者可能有有趣的模型可以帮助推动我们一直在进行的Samba Flow软件的开发。

BRIAN SANTO: High Performance Computing is kind of interesting in that it was for the longest time how many CPUs can you gang up? And at this point, we now have supercomputers that are largely GPU-based. Would AI processors always be a module for those? Or would we ever see an AI supercomputer?

BRIAN SANTO: 高性能计算很有趣，因为在最长的时间内，您可以组合多少个CPU？在这一点上，我们现在拥有主要基于GPU的超级计算机。AI处理器会永远是当中的一个模块吗？或者我们会看到人工智能超级计算机吗？

KUNLE OLUKOTUN: Yeah, we’re gonna see an AI supercomputer because, of course, what the HPC folks are trying to do is being transformed, like everything else in the world, by AI. So if you look at a model of anything — you’re trying to model airflow over a wing, you’re trying to model materials, you’re trying to understand how nuclear bombs explode — you’ve got some physical model that you are trying to simulate. Run through and simulate.

KUNLE OLUKOTUN: 是的，我们会看到人工智能超级计算机，因为，当然，HPC人员正在努力做的事情正在被人工智能改变，就像世界上的其他一切一样。因此，如果您查看任何事物的模型——您尝试对机翼上方的气流进行建模，尝试对材料进行建模，尝试了解核弹如何爆炸——您已经获得了一些试图仿真的物理模型。运行并仿真。

So there’s a model that you might create from first principles, but then you can use the data generated from that model to train a machine learning-based model. And that model may run three orders of magnitude faster than your detailed physical model, but be just as accurate, but maybe not in all regimes. And so what you see is this hybrid situation where some of the time you’re running the detailed physical model, some of the time you’re running this AI model, and then you might be dynamically training that AI model as you move into a different regime.

因此，您可以根据基本原理创建一个模型，然后您可以使用从该模型生成的数据来训练基于机器学习的模型。该模型的运行速度可能比详细的物理模型快三个数量级，但同样准确，但也可能并非在所有情况下都如此。所以您看到的是这种混合情况——有时您运行详细的物理模型，有时您运行这个AI模型，然后当您进入不同的训练规则时，您可能正在动态地训练该AI模型。

So again, we are going back to this earlier discussion we had about convergence between training and inference. Again, you see it happening here in the High Performance Computing regime. And so, ideally, what you want is a single capability that can do all of these kinds of computation very efficiently.

因此，我们再次回到之前关于训练和推理之间收敛的讨论。同样，您会在高性能计算体系中看到它发生在这里。因此，理想情况下，您需要的是一种能够非常高效地完成所有这些类型计算的单一功能。

BRIAN SANTO: Very cool. I am out of the prepared questions I had for you. Let me throw you the general question. What have you been working on recently that you found fun, surprising, interesting, gratifying, whatever?

BRIAN SANTO: 非常酷。现在跳出我为您准备好的问题，让我向您提出一般性问题。您最近在做什么让您觉得有趣、令人惊讶、好玩等等的事情？

KUNLE OLUKOTUN: What have I been working on? So at the end day, I’m a hardcore technology guy who loves to work with PhD students and develop new ideas and get inspired by their exciting development. The interesting thing that has come out most recently — maybe a little esoteric, but it’s exciting to me — is, when programming, you think the dominant idea as being this idea of the thread, right? The thread of control. And so, you think about a thread of control as being some register state and some memory state and add a program counter. That’s a thread of control.

KUNLE OLUKOTUN: 我一直在做什么？所以在最后一天，我是一个铁杆技术人员，喜欢与博士生一起工作，开发新想法并从他们令人兴奋的开发中获得灵感。最近出现的有趣的事情——也许有点深奥，但对我来说很令人兴奋——是，在编程时，您认为主流思想是线程的这种思想，对吧？控制线程。因此，您可以将控制线程视为某种寄存器状态和某种内存状态，然后添加一个程序计数器。这就是一个控制线程。

The first work that I’m most well known for is this whole idea of multi core and multi thread CPUs back in the 90s. We pioneered some of those ideas. And so fast forward to these dataflow architectures where we thought, well, fundamentally, we’re going to move from a threading model to this data flow model, where you’re just focusing on how data moves. And that’s the important thing.

我最出名的第一个作品是90年代关于多核和多线程CPU的整个想法。我们开创了其中一些想法。快进到这些数据流架构，我们认为，从根本上说，我们将从线程模型转移到这个数据流模型，在那里您只关注数据如何移动。这就是重要的事情。

So the really interesting thing was how you could bring these two ideas together. So we have this new paradigm that we call dataflow threads. And it’s really a way of taking threading and making it into dataflow. And so one of the ways you think about these threads is that you want to do a bunch of bunch of things, and you move them around as if they’re dataflow. And it gives you a lot of capabilities. It makes them both more flexible and more efficient than the threading that you get with GPUs. And it makes it possible to now work with very irregular data structures like hash tables and trees, and all these things that have weird capabilities that don’t easily fit into the classic data flow model.

所以真正有趣的是您如何将这两个想法结合在一起。所以我们有了这个新的范式，我们称之为数据流线程。这确实是一种采用线程并将其纳入数据流的方式。因此，您考虑这些线程的一种方式是，您想做一堆事情，然后将它们移动，就好像它们是数据流一样。它为您提供了很多功能。这使它们比使用GPU获得的线程更灵活、更高效。这使得现在可以处理非常不规则的数据结构，例如哈希表和哈希树，以及所有这些具有难以适应经典数据流模型的奇怪功能的东西。

Well, this idea of dataflow threads breaks out of the current constraints of dataflow and gives you the benefits that everybody has seen as threads. But it’s actually more flexible in that you can create threads on the fly, and you don’t get a constraint of conventional GPU threads, which is called thread divergence. That doesn’t affect the dataflow threads.

嗯，数据流线程的这种想法打破了当前数据流的限制，并为您提供了每个人都认为是线程的好处。但它实际上更灵活，因为您可以即时创建线程，而不会受到传统GPU线程的约束，这称为线程发散。这不会影响数据流线程。

And so this whole idea of how can you bring the idea of dataflow and threads together and get benefits and then show how it impacts what you can do with what we call irregular applications? Applications that aren’t like matrix models. Dense matrix multiply is your classic regular application because you know from the beginning what’s going to happen at the end. You know what memory references, you know what the data is. There’s no surprise.

因此，关于如何将数据流和线程的想法结合在一起，获得好处，然后展示它如何影响您可以使用我们所谓的不规则应用程序做什么的整个想法？与矩阵模型不同的应用程序。密集矩阵乘法是您经典的常规应用程序，因为您从一开始就知道最后会发生什么。您知道内存引用什么，您知道数据是什么。没什么好奇怪的。

And a regular application fetches something from memory, looks at that, and maybe decides where to branch based on that. That’s a regular. What data you’re going to touch? You don’t know. So how can you bring these different modalities together and get very efficient when you want to do dense matrix multiply, but also very efficient when you want to do sparse matrix multiply? Or do graph analysis or work on trees? Or do things that don’t look regular?

一个普通的应用程序从内存中获取一些东西，查看它，然后可能根据它决定在哪里分支。那是常规。您要接触什么数据？您不知道。那么，如何将这些不同的模态结合在一起，然后在您想要进行密集矩阵乘法时变得非常高效，而在想要进行稀疏矩阵乘法时也非常高效？或者做图形分析或在树上工作？或者做一些看起来不常规的事情？

BRIAN SANTO: Are there problem categories that would lend themselves to this kind of an approach?

BRIAN SANTO: 是否有适合这种方法的问题类别？

KUNLE OLUKOTUN: What the benefit is, that’s the direction that machine learning is moving in. So you look at look at something like the Google TPU and its dense matrix multiply. Well, if you could actually do sparse training, then you could potentially get rid of 90% of the compute, potentially. But now, what you need is something that is optimized for doing sparse matrix multiply or doing sparse operations.

KUNLE OLUKOTUN: 好处是什么，这就是机器学习的发展方向。所以您看看像谷歌TPU及其密集矩阵乘法这样的东西。好吧，如果您真的可以进行稀疏训练，那么您就有可能摆脱90%的计算。但是现在，您需要的是针对稀疏矩阵乘法或稀疏运算进行了优化的东西。

BRIAN SANTO: Right. And these have usually been, in my understanding, two different categories of AI.

BRIAN SANTO: 对。在我看来，这些通常是两种不同类别的人工智能。

KUNLE OLUKOTUN: Yeah, yeah. You look at the growth of the models. Basically, we are doubling the number of parameters and the amount of compute every two and a half months. And that’s the cadence we’ve been on since 2017. So that’s clearly unsustainable. And so what we’re going to have to do is be clever about how we develop our models. We still want to increase accuracy, but we can’t do it in a brute force manner. We’re gonna have to do it in a way that requires us to be clever.

KUNLE OLUKOT：是的，是的。您看看模型的增长。基本上，我们每两个半月将参数数量和计算量增加一倍。这就是我们自2017年以来的节奏。所以这显然是不可持续的。所以我们要做的就是聪明地开发我们的模型。我们仍然想提高准确性，但我们不能以蛮力的方式做到这一点。我们将不得不以一种需要我们聪明的方式来做到这一点。

And the way that we’re going to be clever is by using sparsity, by doing things that are more regular. Regular, as I said, is dense making small points. At the beginning of the algorithm, what’s gonna happen at the end. Irregular is, I don’t know what data I’m going to fetch. I don’t know what direction I’m gonna go. That’s tough.

我们变得聪明的方法是使用稀疏性，做更常规的事情。正如我所说，常规是在算法开始时密集制作小点，最后会发生什么。不常规的是，我不知道我要获取什么数据。我不知道我要去哪个方向。这很难。

BRIAN SANTO: That’s wild! So what are we talking about? Product in Q2 next year?

BRIAN SANTO: 很好！那么我们在谈论什么？明年第二季度的产品？

KUNLE OLUKOTUN: For what?

UNLE OLUKOTUN: 对于什么？

BRIAN SANTO: For what we’ve just been talking about!

BRIAN SANTO: 对于我们刚刚谈论的事情！

KUNLE OLUKOTUN: Yeah, I don’t know.

KUNLE OLUKOTUN: 哦，我不知道。

BRIAN SANTO: Okay. So I’ll tell the financial community to back off just a little bit.OK。

BRIAN SANTO: 那我就告诉金融界退让一下。

KUNLE OLUKOTUN: Yeah.

KUNLE OLUKOTUN: 好。

BRIAN SANTO: All right, Kunle. Thank you very much for your time. It was a delight talking to you!

BRIAN SANTO: 好的，Kunle。非常感谢您的宝贵时间。很高兴和您交谈！

KUNLE OLUKOTUN: Okay. Thank you a lot for your time, too. It was great to chat with you.

KUNLE OLUKOTUN: 好的。也非常感谢您的时间。很高兴和您聊天。

BRIAN SANTO: We’ve been talking with Kunle Olukotun, chief technologist of AI startup SambaNova.

BRIAN SANTO: 刚刚我们一直在与人工智能初创公司SambaNova的首席技术专家Kunle Olukotun进行交谈。

By the way, my EE Times colleague Sally Ward-Foxton interviewed SambaNova CEO Rodrigo Liang last summer. You can read that article on our web site.

顺便说一下，我EE Times的同事Sally Ward-Foxton去年夏天采访了SambaNova的首席执行官Rodrigo Liang。您可以在我们的网站上阅读那篇文章。

阅读原文