Yingyan (Celine) Lin - ResearchIt has been a privilege for me to collaborate with esteemed professors and colleagues who have inspired me and helped to shape my research philosophy and vision. My PhD works were supported in part by Systems on Nanoscale Information fabriCs (SONIC), a $30 million research center led by my PhD advisor Prof. Naresh R. Shanbhag and Prof. Andrew C. Singer and sponsored by SRC and DARPA. Research BackgroundMachine learning (ML) systems are finding excellent utility in tackling the data deluge of the 21st Century. In fact, many see this as the fourth industrial revolution due to the exponential speed and enormous scale of current and potential transformation. Recently, ML systems exceeded human performance in some applications such as million-scale object recognition. However, this record-breaking performance comes at a large energy cost. For example, Google’s AlphaGo, which amazed everyone by beating the human Go champion this year, runs on 1202 CPUs and 176 GPUs and requires more than four-orders-of-magnitude higher power than the human brain. Therefore, it is imperative to design energy-efficient ML systems to embrace their full benefits. I am primarily interested in addressing this fundamental problem, which is interdisciplinary and involves diverse areas including devices, circuits, VLSI systems and architectures, and ML algorithms. Current ResearchCurrent ML systems are either centralized in a cloud, or distributed at the edge. In cloud platforms, data from the devices of end users, such as mobile phones, are transferred to the data centers which execute ML algorithms on CPU and GPU clusters. The extracted information is then transferred back to users’ devices. While cloud computing is rapidly expanding, recent work shows that the energy cost of transferring data between data centers and local devices can be a significant percentage of the total energy in cloud computing if the usage rate and size of data volume are large. Therefore, there has been an increasing interest to enable local inference capability in the end users’ devices. Local processing of raw data reduces energy and latency, and enhances privacy. In both platforms, there is a grand energy efficiency challenge as described next. In my PhD research, I investigate techniques to address this challenge. Energy Efficiency Challenge in the Data CenterIt is reported that US data centers consumed about 70 billion kilowatt-hours of electricity in 2014, representing 2% of the country’s total energy consumption. Indeed, the costs of power and cooling are becoming significant factors in the total expenditures of large-scale data centers. In particular, data transfer due to inter-chip, inter-board, inter-shelf and inter-rack communications within data centers and inter-site communications between data centers is one of the dominant energy costs. For example, the I/O interface consumes about 20%-70% of the total power in a state-of-the-art 48-core processor. This will only be made worse due to the growing demand for increased I/O bandwidth for high-performance computing in data centers. For example, recent projection indicates that the I/O bandwidth demand will exceed 750 TB/s for super-computers by the year 2020, and the I/O power could reach half of the CPU power. - BER-optimal ADC-Based Receiver for Serial Links This work was selected to be presented at IEEE International Solid-State Circuits Conference Student Research Preview in 2015 (ISSCC SRP 2015) and published at IEEE Transactions on Circuits and Systems I (IEEE TCAS-I) in 2016. More importantly, the technique in this work provides a promising solution to the well-known interface power bottleneck problem in data centers. Energy Efficiency Challenge at the EdgeDevices at the edge including smart-phones, autonomous vehicle, wearable devices, and many others, have limited energy, computational and storage resources since they are battery-powered and have a small form factor. On the other hand, many ML algorithms are computationally intensive. For example, a state-of-the-art convolutional neural network (CNN), AlexNet, requires 666 million MACs per 227×227 image (13k MACs/pixel). Therefore, the energy efficiency challenge is exacerbated if ML algorithms are to be embedded for local inference capability. Conventional designs rely on voltage and process scaling for energy efficiency, which have stagnated. For the problem of resource-constrained computing at the edge, I tackle the issue of energy-efficient implementation of ML algorithms, particularly CNNs. CNNs have recently gained considerable interest due to their record-breaking performance in many recognition tasks. However, their computational complexity hinders their application on power-constrained embedded platforms. In my research, I proposed two techniques for energy-efficient CNN design.
- RD-SEC: Variation-tolerant Architectures for Convolutional Neural Networks in the Near Threshold Voltage Regime This work received the 2nd place Best Student Paper Award when I presented it at IEEE International Workshop on Signal Processing Systems in 2016 (SiPS 2016).
- PredictiveNet: An Energy-efficient Convolutional Neural Network via Zero Prediction This work has been submitted to IEEE International Symposium on Circuits and Systems 2017 (ISCAS 2017) for peer review. Future ResearchLooking forward, I am excited to continue working in the broad area of energy-efficient ML systems. My extensive research experience, such as BOA, RD-SEC and PredictiveNet described above, and my learning from the related courses, such as VLSI in Signal Processing and Communications, Machine Learning in Silicon, and Machine Learning for Signal Processing, provide me with a good understanding of this fundamental yet critical challenge. In my view, this challenge should be addressed by taking a holistic view of the entire information gathering and processing stack. Specifically, I would like to explore the following three directions through active collaboration with faculty members in related areas. Systems:Many ML algorithms are essentially optimization problems and try to minimize certain loss functions. This provides a system-level opportunity to improve energy efficiency: the original optimization problem can be reformulated by introducing extra architecture or circuit constraints for energy purposes. Such resource-constrained reformulation will enable systematic design of energy-efficient ML systems. Possible constraints are cost of data movements, precision requirements for data representation, and others. For example, imposing constraints that favor the reduction of estimation errors in my proposed RD-SEC technique could suppress the estimation errors. Possibly, the estimators themselves can guarantee marginal system performance loss and eliminate the need for power-hungry implementations. Another example is to constrain the training algorithms such that sparsity predictors in PredictiveNet can be shared by multiple kernels thereby reducing cost even further. Architectures:The design of ML systems traditionally employs an expensive worst case design methodology to ensure reliable circuit operation, limiting the achievable energy efficiency. Therefore, new architectures should embrace the inherent robustness of ML algorithms, and bridge the gap between the statistical nature of performance metrics in ML systems and the stochastic device behavior in nanoscale fabrics. I would like to explore the possibility to completely eliminate the partition of data storage and processing units. I speculate that one optimal solution is to develop energy-efficient storage and processing combo units, and distribute many of such in an energy-minimizing manner. On the other hand, I would like to investigate new SEC techniques by taking advantage of ML algorithms’ inherent tolerance to errors and the inherent redundancy within the algorithms themselves. In fact, the RD-SEC technique I proposed is one such heuristic step which makes use of the fact that a large fraction of computation inside a matrix-vector multiplication (a commonly employed ML kernel) can be derived from a small subset for low-cost error detection and correction, and thus enable robust operations in the NTV regime. Circuits:ML algorithms relax precision and linearity requirements of the underlying circuits and devices. This can open up possibilities for energy efficiency. Specifically, I would like to leverage my solid background on circuits and devices, and study new circuit techniques that can take advantage of tolerated non-linearities for energy purposes. On the device side, emerging technologies, such as CNFET or SPIN, have the potential for either aggressive energy savings or increased density but are subject to various hardware errors such as defects or timing errors. I am particularly interested in investigating new SEC techniques to enable robust ML systems on emerging beyond-CMOS technologies to unfold their excellent energy efficiency. Moving forward, I look forward to work with passionate graduate students and collaborate with professors from various fields and departments. My extensive background and research experience in devices, circuits, VLSI systems and architectures, and ML algorithms will help ensure my research is innovative and practical. |