Sep 21 2016
A new means of managing memory on computer chips was discovered by scientists from MIT’s Computer Science and Artificial Intelligence Laboratory, in 2015. This new method would enable efficient use of circuit space as an increasing number of cores, or processing units are used by the chips.
The method can free between 15-25% of the memory of chips that contain hundreds of cores, enabling more efficient computation.
However, this new method adopted a certain type of computational behavior that is not enforced by other contemporary chips. An improved version of this method, which is more consistent with modern chips, was presented by the team of researchers in the International Conference on Parallel Architectures and Compilation Techniques, the same conference where the first version was presented last year.
Unlike conventional computer programs that have instructions written in sequences, multicore chips execute instructions in parallel, and this was the major hurdle that had to be overcome. Computer scientists are developing methods to make parallelization easier for computer programmers.
The first version developed by the MIT researchers, known as Tardis, enforced the sequential consistency standard. Consider a program with different parts with the sequences ABC and XYZ, when parallelized, A, B, and C are assigned to core 1, while X, Y, Z are assigned to core 2.
The sequential consistency standard does not enforce any type of connection between the relative execution times of instructions allotted to different cores. It does not guarantee that core 2 will execute its first instruction, X, before core 1 can move to its second, B.
Neither does it guarantee that core 2 will start the execution of its first instruction, X, before the execution of core 1’s last instruction C. The only guarantee provided by this standard is that core 1 will execute instruction A before B and B before C, while core 2 will execute instruction X before Y and Y before Z.
Xiangyao Yu, a graduate student in electrical engineering and computer science is the first author of the paper. The paper is co-authored by, Yu’s thesis advisor and the Edwin Sibley Webster Professor in MIT’s Department of Electrical Engineering and Computer Science, Srini Devadas. Hongzhe Liu from Algonquin Regional High School and Ethan Zou from Lexington High School are also a part of the research, and joined it thorough MIT’s Program for Research in Mathematics, Engineering and Science (PRIMES) program.
Planned Disorder
Most the modern chips do not even enforce the relatively simple hurdle of reading and writing data, the only operation that Tardis and other memory management chips is concerned with. For example, a standard chip form Intel might assign the read/write instruction sequence ABC but allow the execution to happen in the order ACB.
If the standards of consistency are relaxed, the chips can operate faster.
Let’s say that a core performs a write operation, and the next instruction is a read. Under sequential consistency, I have to wait for the write to finish. If I don’t find the data in my cache [the small local memory bank in which a core stores frequently used data], I have to go to the central place that manages the ownership of data.This may take a lot of messages on the network, and depending on whether another core is holding the data, you might need to contact that core. But what about the following read? That instruction is sitting there, and it cannot be processed. If you allow this reordering, then while this write is outstanding, I can read the next instruction. And you may have a lot of such instructions, and all of them can be executed.
Xiangyao Yu, Graduate Student, MIT
Rather than coordinating memory operations according to chronological time like contemporary memory management schemes, Tardis uses “logical time” for coordination. This allows the scheme to make better use of chip space. All of the data items in the shared memory bank have their own time stamp with Tardis.
All of the cores are also equipped with a counter to effectively time stamp the operations they execute. The counters of no two cores need to be in agreement, and if each core can treat its computations as having performed earlier in time, then all of the cores can continue executing their data that has been updated into the main memory.
Division of Labor
In order to enable the accommodation of better relaxed consistency standards in Tardis, Yu and team simply assigned one counter each for read and write operations, to all cores. If a read operation is chosen for execution before the write operation preceding it is complete, then the counter stamps it with a lower time. The chip is equipped to interpret the event in their right sequence.
The study describes how different manufactures, with their different consistency rules can coordinate counters to enforce the rules in both single and multicore chips. “Because we have time stamps, that makes it very easy to support different consistency models,” Yu says. “Traditionally, when you don’t have the time stamp, then you need to argue about which event happens first in physical time, and that’s a little bit tricky.”
The new work is important because it’s directly related to the most popular relaxed-consistency model that’s in current Intel chips. There were many, many different consistency models explored by Sun Microsystems and other companies, most of which are now out of business. Now it’s all Intel. So matching the consistency model that’s popular for the current Intel chips is incredibly important.
Larry Rudolph, Vice President and Senior Researcher, Two Sigma
Rudolph, who works with an extensive distributed-computing system, believes that the ability of Tardis to offer a unified framework for managing memory at the level of the computer network, the core and all levels in between, is its major appeal. “Today, we have caching in microprocessors, we have the DRAM [dynamic random-access memory] model, and then we have storage, which used to be disk drive,” he says. “So there was a factor of maybe 100 between the time it takes to do a cache access and DRAM access, and then a factor of 10,000 or more to get to disk. With flash [memory] and the new nonvolatile RAMs coming out, there’s going to be a whole hierarchy that’s much nicer. What’s really exciting is that Tardis potentially is a model that will span consistency between processors, storage, and distributed file systems.”