research-article

Warehouse-scale video acceleration: co-design and deployment in the wild

Authors:

Parthasarathy Ranganathan,

Daniel Stodolsky,

Jeremy Dorfman,

Marisabel Guevara,

Clinton Wills Smullen IV,

Raghu Balasubramanian,

Sandeep Bhatia,

Prakash Chauhan,

Niranjani Dasharathi,

Roy W. Huffman Jr.,

Elisha Indupalli,

Indira Jayaram,

Poonacha Kongetira,

David Alexander Munday,

Srikanth Muroor,

Narayana Penukonda,

Eric Perkins-Argueta,

Ville-Mikko Rautio,

Yolanda Ripley,

Sergey N. Sokolov,

Mark S. Wachsler,

Andrew C. Walton,

David A. Wickeraad,

Hon Kwan WuAuthors Info & Claims

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 600 - 615

https://doi.org/10.1145/3445814.3446723

Published: 17 April 2021 Publication History

Abstract

Video sharing (e.g., YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet traffic, and video processing is also foundational to several other key workloads (video conferencing, virtual/augmented reality, cloud gaming, video in Internet-of-Things devices, etc.). The importance of these workloads motivates larger video processing infrastructures and – with the slowing of Moore’s law – specialized hardware accelerators to deliver more computing at higher efficiencies. This paper describes the design and deployment, at scale, of a new accelerator targeted at warehouse-scale video transcoding. We present our hardware design including a new accelerator building block – the video coding unit (VCU) – and discuss key design trade-offs for balanced systems at data center scale and co-designing accelerators with large-scale distributed software systems. We evaluate these accelerators “in the wild" serving live data center jobs, demonstrating 20-33x improved efficiency over our prior well-tuned non-accelerated baseline. Our design also enables effective adaptation to changing bottlenecks and improved failure management, and new workload capabilities not otherwise possible with prior systems. To the best of our knowledge, this is the first work to discuss video acceleration at scale in large warehouse-scale environments.

References

[1]

Ambarella 2015. Ambarella H2 Product Brief. Ambarella. Retrieved February 13, 2021 from https://www.ambarella.com/wp-content/uploads/H2-Product-Brief. pdf

[2]

Ihab Amer, Wael Badawy, and Graham Jullien. 2005. A design flow for an H.264 embedded video encoder. In 2005 International Conference on Information and Communication Technology. IEEE, 505-513. https://doi.org/10.1109/ITICT. 2005. 1609647

[3]

Paul H. Bardell, William H. McAnney, and Jacob Savir. 1987. Built-in Test for VLSI: Pseudorandom Techniques. Wiley-Interscience, USA.

[4]

Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan. 2018. The Datacenter as a Computer (3 ed.). Morgan & Claypool Publishers. https://doi. org/10.2200/S00874ED3V01Y201809CAC046

[5]

Gisle Bjøntegaard. 2001. Calculation of Average PSNR Diferences between RDcurves. In ITU-T SG 16/Q6 (VCEG-M33). ITU, 13th VCEG Meeting, Austin, TX, USA, 1-4.

[6]

Cheng Chen, Jingning Han, and Yaowu Xu. 2020. A Non-local Mean Temporal Filter for Video Compression. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 1142-1146. https://doi.org/10.1109/ICIP40778. 2020.9191313

[7]

Chao Chen, Yao-Chung Lin, Anil Kokaram, and Steve Benting. 2017. Encoding Bitrate Optimization Using Playback Statistics for HTTP-based Adaptive Video Streaming. arXiv: 1709.08763 https://arxiv.org/abs/1709.08763

[8]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). Association for Computing Machinery, New York, NY, USA, 269-284. https://doi.org/10.1145/2541940.2541967

Digital Library

[9]

Yanjiao Chen, Kaishun Wu, and Qian Zhang. 2015. From QoS to QoE: A Tutorial on Video Quality Assessment. IEEE Communications Surveys & Tutorials 17, 2 ( 2015 ), 1126-1165. https://doi.org/10.1109/COMST. 2014.2363139

[10]

Cam Cullen. 2019. Sandvine Internet Phenomena Report Q3 2019. Sandvine. Retrieved August 19, 2020 from https://www.sandvine.com/hubfs/Sandvine_Redesign_2019/Downloads/Internet%20Phenomena/Internet%20Phenomena% 20Report % 20Q32019 % 2020190910.pdf

[11]

Cam Cullen. 2020. Sandvine Global Internet Phenomena COVID-19 Spotlight. Sandvine. Retrieved August 20, 2020 from https://www.sandvine.com/blog/globalinternet-phenomena-covid-19-spotlight-youtube-is-the-1-global-application

[12]

Peter de Rivaz and Jack Haughton. 2019. AV1 Bitstream & Decoding Process Specification. The Alliance for Open Media. Retrieved February 13, 2021 from https://aomediacodec.github.io/av1-spec/av1-spec.pdf

[13]

Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). Association for Computing Machinery, New York, NY, USA, 77-88. https://doi.org/10.1145/2451116.2451125

Digital Library

[14]

FFmpeg developers. 2021. FFmpeg: A complete, cross-platform solution to record, convert and stream audio and video. FFmpeg.org. https://fmpeg.org/

[15]

John Dilley, Bruce Maggs, Jay Parikh, Harald Prokop, Ramesh Sitaraman, and Bill Weihl. 2002. Globally distributed content delivery. IEEE Internet Computing 6, 5 ( 2002 ), 50-58. https://doi.org/10.1109/MIC. 2002.1036038

[16]

Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, and Keith Winstein. 2018. Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol. In Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation (NSDI'18). USENIX Association, USA, 267-282.

[17]

Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 363-376. https://www.usenix.org/conference/nsdi17/technicalsessions/presentation/fouladi

[18]

Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Eficient Neural Network Acceleration with 3D Memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). Association for Computing Machinery, New York, NY, USA, 751-764. https://doi.org/10.1145/ 3037697.3037702

Digital Library

[19]

M.R Garey, R.L Graham, D.S Johnson, and Andrew Chi-Chih Yao. 1976. Resource constrained scheduling as generalized bin packing. Journal of Combinatorial Theory, Series A 21, 3 ( 1976 ), 257-298. https://doi.org/10.1016/ 0097-3165 ( 76 ) 90001-7

[20]

Google, Inc. 2017. Recommended upload encoding settings. Google, Inc. Retrieved Feburary 13, 2021 from https://support.google.com/youtube/answer/1722171

[21]

Adrian Grange, Peter de Rivaz, and Jack Haughton. 2016. Draft VP9 Bitstream and Decoding Process Specification. Google. Retrieved February 13, 2021 from https://www.webmproject.org/vp9/

[22]

Dan Grois, Detlev Marpe, Amit Mulayof, Benaya Itzhaky, and Ofer Hadar. 2013. Performance comparison of H.265/MPEG-HEVC, VP9, and H. 264/MPEG-AVC encoders. In 2013 Picture Coding Symposium (PCS). IEEE, 394-397. https://doi. org/10.1109/PCS. 2013.6737766

[23]

Kaiyuan Guo, Song Han, Song Yao, Yu Wang, Yuan Xie, and Huazhong Yang. 2017. Software-Hardware Codesign for Eficient Neural Network Acceleration. IEEE Micro 37, 2 ( 2017 ), 18-25. https://doi.org/10.1109/MM. 2017.39

[24]

Liwei Guo, Jan De Cock, and Anne Aaron. 2018. Compression Performance Comparison of x264, x265, libvpx and aomenc for On-Demand Adaptive Streaming Applications. In 2018 Picture Coding Symposium (PCS). IEEE, 26-30. https: //doi.org/10.1109/PCS. 2018.8456302

[25]

Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang. 2008. The Stretched Exponential Distribution of Internet Media Access Patterns. In Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing (PODC '08). Association for Computing Machinery, New York, NY, USA, 283-294. https://doi.org/10.1145/1400751.1400789

Digital Library

[26]

R. W. Hamming. 1950. Error detecting and error correcting codes. The Bell System Technical Journal 29, 2 ( 1950 ), 147-160. https://doi.org/10.1002/j.1538-7305. 1950.tb00463.x

[27]

John Hennessy and David Patterson. 2018. A new golden age for computer architecture: Domain-specific hardware/software co-design, enhanced security, open instruction sets, and agile chip development. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 27-29. https: //doi.org/10.1109/ISCA. 2018.00011

[28]

International Telecommunication Union 2019. H. 264 : Advanced Video Coding for generic audiovisual services. International Telecommunication Union. Retrieved February 13, 2021 from https://www.itu.int/rec/T-REC-H. 264-201906-I/en

[29]

Jae-Won Suh and Yo-Sung Ho. 2002. Error concealment techniques for digital TV. IEEE Transactions on Broadcasting 48, 4 ( 2002 ), 299-306. https://doi.org/10. 1109/TBC. 2002.806797

[30]

Norman P. Jouppi, Clif Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Cliford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jefrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jafey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Association for Computing Machinery, New York, NY, USA, 1-12. https://doi.org/10.1145/3079856.3080246

Digital Library

[31]

Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a WarehouseScale Computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). Association for Computing Machinery, New York, NY, USA, 158-169. https://doi.org/10.1145/2749469.2750392

Digital Library

[32]

David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. Association for Computing Machinery, 654-663. https://doi.org/10.1145/258533.258660

Digital Library

[33]

Ioannis Katsavounidis. 2018. Dynamic optimizer-a perceptual video encoding optimization framework. Netflix. Retrieved August 19, 2020 from https://netflixtechblog.com /dynamic-optimizer-a-perceptual-videoencoding-optimization-framework-e19f1e3a277f

[34]

Anil Kokaram, Thierry Foucu, and Yang Hu. 2016. A look into YouTube's video ifle anatomy. Google, Inc. https://www.googblogs. com/a-look-into-youtubesvideo-file-anatomy/

[35]

Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex C Snoeren. 2007. Detection and localization of network black holes. In IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications. IEEE, 2180-2188. https://doi.org/10.1109/INFCOM. 2007.252

[36]

Jan Kufa and Tomas Kratochvil. 2017. Software and hardware HEVC encoding. In 2017 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 1-5. https://doi.org/10.1109/IWSSIP. 2017.7965585

[37]

Kevin Lee and Vijay Rao. 2019. Accelerating Facebook's infrastructure with application-specific hardware. Facebook. Retrieved August 20, 2020 from https: //engineering.fb.com/data-center-engineering/accelerating-infrastructure/

[38]

Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A Polyvalent Machine Learning Accelerator. SIGPLAN Not. 50, 4 (March 2015 ), 369-381. https://doi.org/10.1145/2775054.2694358

Digital Library

[39]

Andrea Lottarini, Alex Ramirez, Joel Coburn, Martha A. Kim, Parthasarathy Ranganathan, Daniel Stodolsky, and Mark Wachsler. 2018. vbench: Benchmarking Video Transcoding in the Cloud. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). Association for Computing Machinery, New York, NY, USA, 797-809. https://doi.org/10.1145/3173162.3173207

Digital Library

[40]

Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor. 2016. ASIC Clouds: Specializing the Datacenter. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, 178-190. https://doi.org/10.1109/ISCA. 2016.25

[41]

Jason Mars and Lingjia Tang. 2013. Whare-Map: Heterogeneity in "Homogeneous" Warehouse-Scale Computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). Association for Computing Machinery, New York, NY, USA, 619-630. https://doi.org/10.1145/2485922. 2485975

Digital Library

[42]

Debargha Mukherjee, Jim Bankoski, Adrian Grange, Jingning Han, John Koleszar, Paul Wilkins, Yaowu Xu, and Ronald Bultje. 2013. The latest open-source video codec VP9-An overview and preliminary results. In 2013 Picture Coding Symposium (PCS). IEEE, 390-393. https://doi.org/10.1109/PCS. 2013.6737765

[43]

Ngoc-Mai Nguyen, Edith Beigne, Suzanne Lesecq, Duy-Hieu Bui, Nam-Khanh Dang, and Xuan-Tu Tran. 2014. H.264/ AVC hardware encoders and low-power features. In 2014 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). IEEE, 77-80. https://doi.org/10.1109/APCCAS. 2014.7032723

[44]

Antonio Ortega and Kannan Ramchandran. 1998. Rate-distortion methods for image and video compression. IEEE Signal Processing Magazine 15, 6 ( 1998 ), 23-50. https://doi.org/10.1109/79.733495

[45]

Grzegorz Pastuszak. 2016. High-speed architecture of the CABAC probability modeling for H.265/HEVC encoders. In 2016 International Conference on Signals and Electronic Systems (ICSES). IEEE, 143-146. https://doi.org/10.1109/ICSES. 2016.7593839

[46]

Francisco Romero and Christina Delimitrou. 2018. Mage: Online and InterferenceAware Scheduling for Multi-Scale Heterogeneous Systems. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT18). Association for Computing Machinery, Article 19, 13 pages. https: //doi.org/10.1145/3243176.3243183

Digital Library

[47]

Samsung 2018. Exynos 8895 Processor: Specs, Features. Samsung. Retrieved February 13, 2021 from https://www.samsung.com/semiconductor/minisite/exynos/ products/mobileprocessor/exynos-9-series-8895/

[48]

Y. Sani, A. Mauthe, and C. Edwards. 2017. Adaptive Bitrate Selection: A Survey. IEEE Communications Surveys Tutorials 19, 4 ( 2017 ), 2985-3014. https://doi.org/ 10.1109/COMST. 2017.2725241

[49]

H. Schwarz, T. Nguyen, D. Marpe, and T. Wiegand. 2019. Hybrid Video Coding with Trellis-Coded Quantization. In 2019 Data Compression Conference (DCC). IEEE, 182-191. https://doi.org/10.1109/DCC. 2019.00026

[50]

Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, USA, 28.

[51]

Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jefery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A Scheduler for Heterogeneous Multicore Systems. SIGOPS Oper. Syst. Rev. 43, 2 (April 2009 ), 66-75. https://doi.org/10.1145/1531793.1531804

Digital Library

[52]

Siemens Digital Industries Software 2021. Catapult High-Level Synthesis. Siemens Digital Industries Software. Retrieved Feburary 13, 2021 from https://www. mentor.com/hls-lp/ catapult-high-level-synthesis

[53]

Akshitha Sriraman and Abhishek Dhanotia. 2020. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 733-750. https://doi.org/10.1145/ 3373376.3378450

Digital Library

[54]

Evgeniy Stepanov and Konstantin Serebryany. 2015. MemorySanitizer: Fast Detector of Uninitialized Memory Use in C++. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). IEEE Computer Society, USA, 46-55. https://doi.org/10.1109/CGO. 2015. 7054186

[55]

Gary J. Sullivan and Thomas Wiegand. 2005. Video Compression-From Concepts to the H.264/AVC Standard. Proc. IEEE 93, 1 ( 2005 ), 18-31. https://doi.org/10. 1109/JPROC. 2004.839617

[56]

A. Takach. 2016. High-Level Synthesis: Status, Trends, and Future Directions. IEEE Design & Test 33, 3 ( 2016 ), 116-124. https://doi.org/10.1109/MDAT. 2016.2544850

[57]

Tung-Chien Chen, Chung-Jr Lian, and Liang-Gee Chen. 2006. Hardware architecture design of an H.264/AVC video codec. In Asia and South Pacific Conference on Design Automation, 2006. IEEE, 8 pp.-. https://doi.org/10.1109/ASPDAC. 2006. 1594776

[58]

K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In 2012 39th Annual International Symposium on Computer Architecture (ISCA). IEEE, 213-224. https://doi.org/10.1109/ISCA. 2012.6237019

[59]

Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the European Conference on Computer Systems (EuroSys). Association for Computing Machinery, Bordeaux, France, Article 18, 17 pages. https://doi.org/10.1145/2741948.2741964

Digital Library

[60]

K. Wei, S. Zhang, H. Jia, D. Xie, and W. Gao. 2012. A flexible and high-performance hardware video encoder architecture. In 2012 Picture Coding Symposium. IEEE, 373-376. https://doi.org/10.1109/PCS. 2012.6213368

[61]

P. H. Westerink, R. Rajagopalan, and C. A. Gonzales. 1999. Two-pass MPEG-2 variable-bit-rate encoding. IBM Journal of Research and Development 43, 4 ( 1999 ), 471-488. https://doi.org/10.1147/rd.434.0471

Digital Library

[62]

M. A. Wilhelmsen, H. K. Stensland, V. R. Gaddam, A. Mortensen, R. Langseth, C. Griwodz, and P. Halvorsen. 2014. Using a Commodity Hardware Video Encoder for Interactive Video Streaming. In 2014 IEEE International Symposium on Multimedia. IEEE, 251-254. https://doi.org/10.1109/ISM. 2014.58

[63]

Yaowu Xu. 2010. Inside WebM Technology: The VP8 Alternate Reference Frame. Google, Inc. Retrieved Feburary 13, 2021 from http://blog.webmproject.org/ 2010 / 05/inside-webm-technology-vp8-alternate.html

[64]

Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jef Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 369-383. https://doi.org/10.1145/3373376.3378514

Digital Library

[65]

Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and Liang-Gee Chen. 2005. Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder. IEEE Transactions on Circuits and Systems for Video Technology 15, 3 ( 2005 ), 378-401. https://doi.org/10.1109/TCSVT. 2004.842620

[66]

Whitney Zhao, Tifany Jin, Cheng Chen, Siamak Taveallaei, and Zhenghui Wu. 2019. OCP Accelerator Module Design Specification. Open Compute Project. Retrieved February 13, 2021 from https://www.opencompute.org/documents/ocpaccelerator-module-design-specification-v1p0-3-pdf

Cited By

Starc RKuchler TGiardino MKlimovic A(2024)Serverless? RISC more!Proceedings of the 2nd Workshop on SErverless Systems, Applications and MEthodologies10.1145/3642977.3652095(15-24)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3642977.3652095
Yuan YWang RRanganathan NRao NKumar SLantz PSanjeepan VCabrera JKwatra ASankaran RJeong IKim N(2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00066
Wang SXu HMamandipoor AMahapatra RAhn BGhodrati SKailas KAlian MEsmaeilzadeh H(2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00083
Show More Cited By

Index Terms

Warehouse-scale video acceleration: co-design and deployment in the wild
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Hardware-software codesign

Recommendations

Design of a Classification System for Rectangular Shapes Using a Co-Design Environment
SBCCI '00: Proceedings of the 13th symposium on Integrated circuits and systems design

Pattern localization and classification are CPU time intensive, being normally implemented in software. Custom implementations in hardware allow real-time processing. In practice, in ASIC or FPGA implementations, the digitization process introduces ...
An undergraduate system-on-chip (SoC) course for computer engineering students

The authors have developed a senior-level undergraduate system-on-chip (SoC) course at San Jose State University, San Jose, CA, that emphasizes SoC design methods and hardware-software codesign techniques. The course uses a "real world" design project ...
Platform-based design for an embedded-fingerprint-authentication device

Fingerprint authentication, in an embedded and portable context, requires complex signal, network, and security-protocol processing in a resource-constrained implementation. We present a platform-based design approach for this application, based on a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 2021

1090 pages

ISBN:9781450383172

DOI:10.1145/3445814

General Chair:
Tim Sherwood
University of California at Santa Barbara, USA
,
Program Chairs:
Emery Berger
University of Massachusetts at Amherst, USA
,
Christos Kozyrakis
Stanford University, USA

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2021

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPLOS '21

Sponsor:

SIGPLAN

ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 19 - 23, 2021

Virtual, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
8,827
Total Downloads

Downloads (Last 12 months)353
Downloads (Last 6 weeks)26

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Starc RKuchler TGiardino MKlimovic A(2024)Serverless? RISC more!Proceedings of the 2nd Workshop on SErverless Systems, Applications and MEthodologies10.1145/3642977.3652095(15-24)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3642977.3652095
Yuan YWang RRanganathan NRao NKumar SLantz PSanjeepan VCabrera JKwatra ASankaran RJeong IKim N(2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00066
Wang SXu HMamandipoor AMahapatra RAhn BGhodrati SKailas KAlian MEsmaeilzadeh H(2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00083
Moina-Rivera WGarcia-Pineda MGutiérrez-Aguado JAlcaraz-Calero J(2024)Cloud media video encoding: review and challengesMultimedia Tools and Applications10.1007/s11042-024-18763-2Online publication date: 9-Mar-2024
https://doi.org/10.1007/s11042-024-18763-2
Ranganathan P(2023)A Six-Word Story on the Future of VLSI: AI-driven, Software-defined, and Uncomfortably Exciting2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)10.23919/VLSITechnologyandCir57934.2023.10185339(1-4)Online publication date: 11-Jun-2023
https://doi.org/10.23919/VLSITechnologyandCir57934.2023.10185339
Seemakhupt KStephens BKhan SLiu SWassel HYeganeh SSnoeren AKrishnamurthy ACuller DLevy HDruschel PKaufmann AMace JFlinn JSeltzer M(2023)A Cloud-Scale Characterization of Remote Procedure CallsProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613156(498-514)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613156
Reddy HChen YLan JKatsavounidis IAnandharengan BLalgudi HAlaparthi SHua GChuang HWu PLei ZMastro APetersen CChaudhari GPrakash PRegunathan SReddy SVenkatapuram PRao VNoru KBjorlin AZeile MLewis ASingh ASunil AChen CLin CChen CPalamadai Sundar DJayaraman DUcar HLi HSingh JCheng Liu JRachamreddy KSriadibhatla KDatla KVan Den Berg LFeng LJampani PMoola RMallya RJha SPan SSrinivasan SVaduganathan VZha XWang ZSengottuvel AAlluri BOshin BKanumetta CSahin EAthaide JWu JKurapati KManthati KThottempudi KRao Chennamsetti RJagannath KArvapalli SKala TWang TChopda PGandhi KRamesh AGupta RFadnavis SQassoud AFriedt CLi FGao HLee JDixit MUgaji SKaruturi TXie XNarasimha AJakka BDodds BYang JSkandakumaran KModi MModi PStejerean CRonca DWang HPham NLu LShen HNing JNarayanan KChen LAvidan NArnold WXu FPatil GBalan VGrandhi S(2023)Efficient video processing at scale using MSVPApplications of Digital Image Processing XLVI10.1117/12.2685875(38)Online publication date: 4-Oct-2023
https://doi.org/10.1117/12.2685875
Budhkar PRao NSundaram JKarnik T(2023)CPU Microarchitectural Performance Analysis of SVT-AV1 Encoder2023 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP49359.2023.10222388(3045-3049)Online publication date: 8-Oct-2023
https://doi.org/10.1109/ICIP49359.2023.10222388
Norkin AGrange AConcolato CKatsavounidis ITmar HMammou KLiu SBaliga R(2022)Alliance for Open Media (AOMedia) Progress ReportSMPTE Motion Imaging Journal10.5594/JMI.2022.3190532131:8(88-92)Online publication date: Sep-2022
https://doi.org/10.5594/JMI.2022.3190532
Shan YLin WGuo ZZhang YSerafini MXu H(2022)Towards a fully disaggregated and programmable data centerProceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3546591.3547527(18-28)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3546591.3547527
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents