skip to main content
research-article

Warehouse-scale video acceleration: co-design and deployment in the wild

Published: 17 April 2021 Publication History

Abstract

Video sharing (e.g., YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet traffic, and video processing is also foundational to several other key workloads (video conferencing, virtual/augmented reality, cloud gaming, video in Internet-of-Things devices, etc.). The importance of these workloads motivates larger video processing infrastructures and – with the slowing of Moore’s law – specialized hardware accelerators to deliver more computing at higher efficiencies. This paper describes the design and deployment, at scale, of a new accelerator targeted at warehouse-scale video transcoding. We present our hardware design including a new accelerator building block – the video coding unit (VCU) – and discuss key design trade-offs for balanced systems at data center scale and co-designing accelerators with large-scale distributed software systems. We evaluate these accelerators “in the wild" serving live data center jobs, demonstrating 20-33x improved efficiency over our prior well-tuned non-accelerated baseline. Our design also enables effective adaptation to changing bottlenecks and improved failure management, and new workload capabilities not otherwise possible with prior systems. To the best of our knowledge, this is the first work to discuss video acceleration at scale in large warehouse-scale environments.

References

[1]
Ambarella 2015. Ambarella H2 Product Brief. Ambarella. Retrieved February 13, 2021 from https://www.ambarella.com/wp-content/uploads/H2-Product-Brief. pdf
[2]
Ihab Amer, Wael Badawy, and Graham Jullien. 2005. A design flow for an H.264 embedded video encoder. In 2005 International Conference on Information and Communication Technology. IEEE, 505-513. https://doi.org/10.1109/ITICT. 2005. 1609647
[3]
Paul H. Bardell, William H. McAnney, and Jacob Savir. 1987. Built-in Test for VLSI: Pseudorandom Techniques. Wiley-Interscience, USA.
[4]
Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan. 2018. The Datacenter as a Computer (3 ed.). Morgan & Claypool Publishers. https://doi. org/10.2200/S00874ED3V01Y201809CAC046
[5]
Gisle Bjøntegaard. 2001. Calculation of Average PSNR Diferences between RDcurves. In ITU-T SG 16/Q6 (VCEG-M33). ITU, 13th VCEG Meeting, Austin, TX, USA, 1-4.
[6]
Cheng Chen, Jingning Han, and Yaowu Xu. 2020. A Non-local Mean Temporal Filter for Video Compression. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 1142-1146. https://doi.org/10.1109/ICIP40778. 2020.9191313
[7]
Chao Chen, Yao-Chung Lin, Anil Kokaram, and Steve Benting. 2017. Encoding Bitrate Optimization Using Playback Statistics for HTTP-based Adaptive Video Streaming. arXiv: 1709.08763 https://arxiv.org/abs/1709.08763
[8]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). Association for Computing Machinery, New York, NY, USA, 269-284. https://doi.org/10.1145/2541940.2541967
[9]
Yanjiao Chen, Kaishun Wu, and Qian Zhang. 2015. From QoS to QoE: A Tutorial on Video Quality Assessment. IEEE Communications Surveys & Tutorials 17, 2 ( 2015 ), 1126-1165. https://doi.org/10.1109/COMST. 2014.2363139
[10]
Cam Cullen. 2019. Sandvine Internet Phenomena Report Q3 2019. Sandvine. Retrieved August 19, 2020 from https://www.sandvine.com/hubfs/Sandvine_Redesign_2019/Downloads/Internet%20Phenomena/Internet%20Phenomena% 20Report % 20Q32019 % 2020190910.pdf
[11]
Cam Cullen. 2020. Sandvine Global Internet Phenomena COVID-19 Spotlight. Sandvine. Retrieved August 20, 2020 from https://www.sandvine.com/blog/globalinternet-phenomena-covid-19-spotlight-youtube-is-the-1-global-application
[12]
Peter de Rivaz and Jack Haughton. 2019. AV1 Bitstream & Decoding Process Specification. The Alliance for Open Media. Retrieved February 13, 2021 from https://aomediacodec.github.io/av1-spec/av1-spec.pdf
[13]
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). Association for Computing Machinery, New York, NY, USA, 77-88. https://doi.org/10.1145/2451116.2451125
[14]
FFmpeg developers. 2021. FFmpeg: A complete, cross-platform solution to record, convert and stream audio and video. FFmpeg.org. https://fmpeg.org/
[15]
John Dilley, Bruce Maggs, Jay Parikh, Harald Prokop, Ramesh Sitaraman, and Bill Weihl. 2002. Globally distributed content delivery. IEEE Internet Computing 6, 5 ( 2002 ), 50-58. https://doi.org/10.1109/MIC. 2002.1036038
[16]
Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, and Keith Winstein. 2018. Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol. In Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation (NSDI'18). USENIX Association, USA, 267-282.
[17]
Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 363-376. https://www.usenix.org/conference/nsdi17/technicalsessions/presentation/fouladi
[18]
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Eficient Neural Network Acceleration with 3D Memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). Association for Computing Machinery, New York, NY, USA, 751-764. https://doi.org/10.1145/ 3037697.3037702
[19]
M.R Garey, R.L Graham, D.S Johnson, and Andrew Chi-Chih Yao. 1976. Resource constrained scheduling as generalized bin packing. Journal of Combinatorial Theory, Series A 21, 3 ( 1976 ), 257-298. https://doi.org/10.1016/ 0097-3165 ( 76 ) 90001-7
[20]
Google, Inc. 2017. Recommended upload encoding settings. Google, Inc. Retrieved Feburary 13, 2021 from https://support.google.com/youtube/answer/1722171
[21]
Adrian Grange, Peter de Rivaz, and Jack Haughton. 2016. Draft VP9 Bitstream and Decoding Process Specification. Google. Retrieved February 13, 2021 from https://www.webmproject.org/vp9/
[22]
Dan Grois, Detlev Marpe, Amit Mulayof, Benaya Itzhaky, and Ofer Hadar. 2013. Performance comparison of H.265/MPEG-HEVC, VP9, and H. 264/MPEG-AVC encoders. In 2013 Picture Coding Symposium (PCS). IEEE, 394-397. https://doi. org/10.1109/PCS. 2013.6737766
[23]
Kaiyuan Guo, Song Han, Song Yao, Yu Wang, Yuan Xie, and Huazhong Yang. 2017. Software-Hardware Codesign for Eficient Neural Network Acceleration. IEEE Micro 37, 2 ( 2017 ), 18-25. https://doi.org/10.1109/MM. 2017.39
[24]
Liwei Guo, Jan De Cock, and Anne Aaron. 2018. Compression Performance Comparison of x264, x265, libvpx and aomenc for On-Demand Adaptive Streaming Applications. In 2018 Picture Coding Symposium (PCS). IEEE, 26-30. https: //doi.org/10.1109/PCS. 2018.8456302
[25]
Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang. 2008. The Stretched Exponential Distribution of Internet Media Access Patterns. In Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing (PODC '08). Association for Computing Machinery, New York, NY, USA, 283-294. https://doi.org/10.1145/1400751.1400789
[26]
R. W. Hamming. 1950. Error detecting and error correcting codes. The Bell System Technical Journal 29, 2 ( 1950 ), 147-160. https://doi.org/10.1002/j.1538-7305. 1950.tb00463.x
[27]
John Hennessy and David Patterson. 2018. A new golden age for computer architecture: Domain-specific hardware/software co-design, enhanced security, open instruction sets, and agile chip development. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 27-29. https: //doi.org/10.1109/ISCA. 2018.00011
[28]
International Telecommunication Union 2019. H. 264 : Advanced Video Coding for generic audiovisual services. International Telecommunication Union. Retrieved February 13, 2021 from https://www.itu.int/rec/T-REC-H. 264-201906-I/en
[29]
Jae-Won Suh and Yo-Sung Ho. 2002. Error concealment techniques for digital TV. IEEE Transactions on Broadcasting 48, 4 ( 2002 ), 299-306. https://doi.org/10. 1109/TBC. 2002.806797
[30]
Norman P. Jouppi, Clif Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Cliford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jefrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jafey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Association for Computing Machinery, New York, NY, USA, 1-12. https://doi.org/10.1145/3079856.3080246
[31]
Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a WarehouseScale Computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). Association for Computing Machinery, New York, NY, USA, 158-169. https://doi.org/10.1145/2749469.2750392
[32]
David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. Association for Computing Machinery, 654-663. https://doi.org/10.1145/258533.258660
[33]
Ioannis Katsavounidis. 2018. Dynamic optimizer-a perceptual video encoding optimization framework. Netflix. Retrieved August 19, 2020 from https://netflixtechblog.com /dynamic-optimizer-a-perceptual-videoencoding-optimization-framework-e19f1e3a277f
[34]
Anil Kokaram, Thierry Foucu, and Yang Hu. 2016. A look into YouTube's video ifle anatomy. Google, Inc. https://www.googblogs. com/a-look-into-youtubesvideo-file-anatomy/
[35]
Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex C Snoeren. 2007. Detection and localization of network black holes. In IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications. IEEE, 2180-2188. https://doi.org/10.1109/INFCOM. 2007.252
[36]
Jan Kufa and Tomas Kratochvil. 2017. Software and hardware HEVC encoding. In 2017 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 1-5. https://doi.org/10.1109/IWSSIP. 2017.7965585
[37]
Kevin Lee and Vijay Rao. 2019. Accelerating Facebook's infrastructure with application-specific hardware. Facebook. Retrieved August 20, 2020 from https: //engineering.fb.com/data-center-engineering/accelerating-infrastructure/
[38]
Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A Polyvalent Machine Learning Accelerator. SIGPLAN Not. 50, 4 (March 2015 ), 369-381. https://doi.org/10.1145/2775054.2694358
[39]
Andrea Lottarini, Alex Ramirez, Joel Coburn, Martha A. Kim, Parthasarathy Ranganathan, Daniel Stodolsky, and Mark Wachsler. 2018. vbench: Benchmarking Video Transcoding in the Cloud. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). Association for Computing Machinery, New York, NY, USA, 797-809. https://doi.org/10.1145/3173162.3173207
[40]
Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor. 2016. ASIC Clouds: Specializing the Datacenter. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, 178-190. https://doi.org/10.1109/ISCA. 2016.25
[41]
Jason Mars and Lingjia Tang. 2013. Whare-Map: Heterogeneity in "Homogeneous" Warehouse-Scale Computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). Association for Computing Machinery, New York, NY, USA, 619-630. https://doi.org/10.1145/2485922. 2485975
[42]
Debargha Mukherjee, Jim Bankoski, Adrian Grange, Jingning Han, John Koleszar, Paul Wilkins, Yaowu Xu, and Ronald Bultje. 2013. The latest open-source video codec VP9-An overview and preliminary results. In 2013 Picture Coding Symposium (PCS). IEEE, 390-393. https://doi.org/10.1109/PCS. 2013.6737765
[43]
Ngoc-Mai Nguyen, Edith Beigne, Suzanne Lesecq, Duy-Hieu Bui, Nam-Khanh Dang, and Xuan-Tu Tran. 2014. H.264/ AVC hardware encoders and low-power features. In 2014 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). IEEE, 77-80. https://doi.org/10.1109/APCCAS. 2014.7032723
[44]
Antonio Ortega and Kannan Ramchandran. 1998. Rate-distortion methods for image and video compression. IEEE Signal Processing Magazine 15, 6 ( 1998 ), 23-50. https://doi.org/10.1109/79.733495
[45]
Grzegorz Pastuszak. 2016. High-speed architecture of the CABAC probability modeling for H.265/HEVC encoders. In 2016 International Conference on Signals and Electronic Systems (ICSES). IEEE, 143-146. https://doi.org/10.1109/ICSES. 2016.7593839
[46]
Francisco Romero and Christina Delimitrou. 2018. Mage: Online and InterferenceAware Scheduling for Multi-Scale Heterogeneous Systems. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT18). Association for Computing Machinery, Article 19, 13 pages. https: //doi.org/10.1145/3243176.3243183
[47]
Samsung 2018. Exynos 8895 Processor: Specs, Features. Samsung. Retrieved February 13, 2021 from https://www.samsung.com/semiconductor/minisite/exynos/ products/mobileprocessor/exynos-9-series-8895/
[48]
Y. Sani, A. Mauthe, and C. Edwards. 2017. Adaptive Bitrate Selection: A Survey. IEEE Communications Surveys Tutorials 19, 4 ( 2017 ), 2985-3014. https://doi.org/ 10.1109/COMST. 2017.2725241
[49]
H. Schwarz, T. Nguyen, D. Marpe, and T. Wiegand. 2019. Hybrid Video Coding with Trellis-Coded Quantization. In 2019 Data Compression Conference (DCC). IEEE, 182-191. https://doi.org/10.1109/DCC. 2019.00026
[50]
Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, USA, 28.
[51]
Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jefery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A Scheduler for Heterogeneous Multicore Systems. SIGOPS Oper. Syst. Rev. 43, 2 (April 2009 ), 66-75. https://doi.org/10.1145/1531793.1531804
[52]
Siemens Digital Industries Software 2021. Catapult High-Level Synthesis. Siemens Digital Industries Software. Retrieved Feburary 13, 2021 from https://www. mentor.com/hls-lp/ catapult-high-level-synthesis
[53]
Akshitha Sriraman and Abhishek Dhanotia. 2020. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 733-750. https://doi.org/10.1145/ 3373376.3378450
[54]
Evgeniy Stepanov and Konstantin Serebryany. 2015. MemorySanitizer: Fast Detector of Uninitialized Memory Use in C++. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). IEEE Computer Society, USA, 46-55. https://doi.org/10.1109/CGO. 2015. 7054186
[55]
Gary J. Sullivan and Thomas Wiegand. 2005. Video Compression-From Concepts to the H.264/AVC Standard. Proc. IEEE 93, 1 ( 2005 ), 18-31. https://doi.org/10. 1109/JPROC. 2004.839617
[56]
A. Takach. 2016. High-Level Synthesis: Status, Trends, and Future Directions. IEEE Design & Test 33, 3 ( 2016 ), 116-124. https://doi.org/10.1109/MDAT. 2016.2544850
[57]
Tung-Chien Chen, Chung-Jr Lian, and Liang-Gee Chen. 2006. Hardware architecture design of an H.264/AVC video codec. In Asia and South Pacific Conference on Design Automation, 2006. IEEE, 8 pp.-. https://doi.org/10.1109/ASPDAC. 2006. 1594776
[58]
K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In 2012 39th Annual International Symposium on Computer Architecture (ISCA). IEEE, 213-224. https://doi.org/10.1109/ISCA. 2012.6237019
[59]
Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the European Conference on Computer Systems (EuroSys). Association for Computing Machinery, Bordeaux, France, Article 18, 17 pages. https://doi.org/10.1145/2741948.2741964
[60]
K. Wei, S. Zhang, H. Jia, D. Xie, and W. Gao. 2012. A flexible and high-performance hardware video encoder architecture. In 2012 Picture Coding Symposium. IEEE, 373-376. https://doi.org/10.1109/PCS. 2012.6213368
[61]
P. H. Westerink, R. Rajagopalan, and C. A. Gonzales. 1999. Two-pass MPEG-2 variable-bit-rate encoding. IBM Journal of Research and Development 43, 4 ( 1999 ), 471-488. https://doi.org/10.1147/rd.434.0471
[62]
M. A. Wilhelmsen, H. K. Stensland, V. R. Gaddam, A. Mortensen, R. Langseth, C. Griwodz, and P. Halvorsen. 2014. Using a Commodity Hardware Video Encoder for Interactive Video Streaming. In 2014 IEEE International Symposium on Multimedia. IEEE, 251-254. https://doi.org/10.1109/ISM. 2014.58
[63]
Yaowu Xu. 2010. Inside WebM Technology: The VP8 Alternate Reference Frame. Google, Inc. Retrieved Feburary 13, 2021 from http://blog.webmproject.org/ 2010 / 05/inside-webm-technology-vp8-alternate.html
[64]
Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jef Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 369-383. https://doi.org/10.1145/3373376.3378514
[65]
Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and Liang-Gee Chen. 2005. Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder. IEEE Transactions on Circuits and Systems for Video Technology 15, 3 ( 2005 ), 378-401. https://doi.org/10.1109/TCSVT. 2004.842620
[66]
Whitney Zhao, Tifany Jin, Cheng Chen, Siamak Taveallaei, and Zhenghui Wu. 2019. OCP Accelerator Module Design Specification. Open Compute Project. Retrieved February 13, 2021 from https://www.opencompute.org/documents/ocpaccelerator-module-design-specification-v1p0-3-pdf

Cited By

View all
  • (2024)Serverless? RISC more!Proceedings of the 2nd Workshop on SErverless Systems, Applications and MEthodologies10.1145/3642977.3652095(15-24)Online publication date: 22-Apr-2024
  • (2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
  • (2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
April 2021
1090 pages
ISBN:9781450383172
DOI:10.1145/3445814
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2021

Check for updates

Author Tags

  1. domain-specific accelerators
  2. hardware-software codesign
  3. video transcoding
  4. warehouse-scale computing

Qualifiers

  • Research-article

Conference

ASPLOS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)353
  • Downloads (Last 6 weeks)26
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Serverless? RISC more!Proceedings of the 2nd Workshop on SErverless Systems, Applications and MEthodologies10.1145/3642977.3652095(15-24)Online publication date: 22-Apr-2024
  • (2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
  • (2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
  • (2024)Cloud media video encoding: review and challengesMultimedia Tools and Applications10.1007/s11042-024-18763-2Online publication date: 9-Mar-2024
  • (2023)A Six-Word Story on the Future of VLSI: AI-driven, Software-defined, and Uncomfortably Exciting2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)10.23919/VLSITechnologyandCir57934.2023.10185339(1-4)Online publication date: 11-Jun-2023
  • (2023)A Cloud-Scale Characterization of Remote Procedure CallsProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613156(498-514)Online publication date: 23-Oct-2023
  • (2023)Efficient video processing at scale using MSVPApplications of Digital Image Processing XLVI10.1117/12.2685875(38)Online publication date: 4-Oct-2023
  • (2023)CPU Microarchitectural Performance Analysis of SVT-AV1 Encoder2023 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP49359.2023.10222388(3045-3049)Online publication date: 8-Oct-2023
  • (2022)Alliance for Open Media (AOMedia) Progress ReportSMPTE Motion Imaging Journal10.5594/JMI.2022.3190532131:8(88-92)Online publication date: Sep-2022
  • (2022)Towards a fully disaggregated and programmable data centerProceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3546591.3547527(18-28)Online publication date: 23-Aug-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media