## Ultra Low Power and Ultra High Speed Information Processing System Based on Single Flux Quantum Logic Nagoya Univ.\*, Yokohama National Univ.\*\*, Hokkaido Univ.\*\*\*, Communication Research Lab.\*\*\*\* Akira Fujimaki\*, Naofumi Takagi\*, Nobuyuki Yoshikawa\*\*, Yoshiaki Takai\*\*\*, Hirotaka Terai\*\*\*\* Abstract----Single-Flux-Quantum (SFQ) Logic has high potential for constructing ultra low power information processing systems with high performances because of its own natures. We have already developed SFQ-ICs designed based on the cell-based design technique. We have been studying micro-architectures of a processor executing the Java Byte Codes directly and multi-processor system constructed by connecting the processors and memories via the ultra-broad-band SFQ network switches. We also study the timing-driven design technique in a pico-second scale for realizing SFQ-LSIs operating up to 100 GHz. Throughout this study, we employ the concept of the complexity-reduction (CORE), in which the high-speed nature of the SFQ circuits is used for the reduction of the circuit scale or simplification of the micro-architecture, while the multi-processor system has high performances. Reflecting the low power nature of the SFQ circuits, the total power consumption including a cryo-cooler is one order of magnitude smaller than that of the system based on the semiconductor technology. Why SFQ? The extremely high throughput is required for the future network devices including servers as well as the low power consumption. The SFQ technology has capability to satisfy these two requirements simultaneously. See the figure below. The SFQ signals have the return-to-zero nature. The circuits are released from the recharge process of the interconnection which limits the operation speed of the semiconductor LSIs, The width of the SFQ pulse is only a few ps, resulting in a signal processor operating up to 100 GHz. An SFQ gate consumes only 100nW in an operation period. The power-delay(period) product of the SFQ gate is three orders of magnitude smaller than that of other devices. ## Implementation of CORE Processor To realize an ultra high-speed processor, the register files should be placed as close to the ALU as possible for any micro-architecture. The low power nature of the SFQ enables us to construct real 3D circuits, leading to the realization of short interconnections between the ALU and the register files. The floor plan of a CORE processor is shown in the next figure. The processor alone acts as the 25GHz-clock processor if the clock frequency is 100GHz. The processor is so-called Java-processor that executes the Java Byte Codes. From the programmers' point of view, the object-oriented, class-based, machine-independent language like Java is required for reducing the time-to-market or the development cost. The Java processor is based on the stack architecture. The SFQ circuits can easily realize the stack caches using the shift registers. The photograph below shows a shift register circuit studied under the project of the MEXT. The circuit successfully operates up to 55GHz. The cell-based design technique used here can be expanded to an LSI scale. ## Multi-Processor System Higher performance is obtained by constructing a multi-processor system by connecting the CORE processors and memories through the SFQ network switches. The figure below is the block diagram of the multi-processor system. The system consists of the UMA (Uniform memory access)-blocks and the cache-coherent NUMA block. The band width of the UMA block reaches 12Tbps. ## Power Consumption The table below displays the estimated power consumption of the several combinations of the processors and memories. Equivalent CPU Performance 128bit, 100GHz Local Clock Size of Main Memory: 4GB | | | Power<br>consumption | Total | |-----------------|----------------|----------------------|--------| | SFQ | CPU | 32 mW | | | multiprocessor | Network switch | 50 mW | 300 W | | with SFQshift | SFQMemory | 20 mW | | | register memory | Coder | 300 W | | | SFQ | CPU | 32mW | | | multiprocessor | Network Switch | 50 mW | 450W | | with RT CIMOS | Coder | 250W | | | memory | Memory | 200W | | | CIMOS | CPU | 6kW | | | Microprocessor | Memory | 200 W | 8.71dW | | | Coder | 2.5 kW | |