Research Interests
Optimizing compilers, Language runtime systems, Parallel algorithms, Deep learning algorithms
Publications
Conference Papers
- Hiroshi Inoue, 'Multi-step LRU: SIMD-based Cache Replacement for Lower Overhead and Higher Precision', IEEE International Conference on Big Data (IEEE BigData 2021), online, December 15-18, 2021. (short paper, Acceptance rate: (97+96)/486 = 39.7%)
- Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei Wang, Sanchari Sen, Jintao Zhang, Ankur Agrawal, Monodeep Kar, Shubham Jain, Alberto Mannari, Hoang Tran, Yulong Li, Eri Ogawa, Kazuaki Ishizaki, Hiroshi Inoue, Marcel Schaal, Mauricio Serrano, Jungwook Choi, Xiao Sun, Naigang Wang, Chia-Yu Chen, Allison Allain, James Bonano, Nianzheng Cao, Robert Casatuta, Matthew Cohen, Bruce Fleischer, Michael Guillorn, Howard Haynie, Jinwook Jung, Mingu Kang, Kyu-hyoun Kim, Siyu Koswatta, Saekyu Lee, Martin Lutz, Silvia Mueller, Jinwook Oh, Ashish Ranjan, Zhibin Ren, Scot Rider, Kerstin Schelm, Michael Scheuermann, Joel Silberman, Jie Yang, Vidhi Zalani, Xin Zhang, Ching Zhou, Matt Ziegler, Vinay Shah, Moriyoshi Ohara, Pong-Fei Lu, Brian Curran, Sunil Shukla, Leland Chang, Kailash Gopalakrishnan, 'RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference', International Symposium on Computer Architecture (ISCA 2021), June 14-16, 2021.
- Eri Ogawa, Kazuaki Ishizaki, Hiroshi Inoue, Swagath Venkataramani, Jungwook Choi, Wei Wang, Vijayalakshmi Srinivasan, Moriyoshi Oharaand Kailash Gopalakrishn, 'A Compiler for Deep Neural Network Accelerators to Generate Optimized Code for a Wide Range of Data Parameters from a Hand-crafted Computation Kernel', IEEE Symposium on Low-Power and High-Speed Chips (COOL Chips XXII), Yokohama, Japan, April 17-29, 2019.
- Hiroshi Inoue, 'Adaptive Ensemble Prediction for Deep Neural Networks based on Confidence Level', The 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019), Naha, Okinawa, Japan, April 16-18, 2019. (Acceptance rate: 360/1111 = 32.4%)
- Hiroshi Inoue, 'Fast Interpolation of Grid Data at a Non-Grid Point', IEEE International Conference on Big Data (IEEE BigData 2017), Boston, MA, USA, December 11-14, 2017. (Acceptance rate: 79/437 = 18.1%)
- Jan Wróblewski, Kazuaki Ishizaki, Hiroshi Inoue and Moriyoshi Ohara, 'Accelerating Spark Datasets by inlining deserialization', 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS 2017), May 29-June 2, 2017. (Acceptance rate: 116/508 = 22.8%)
- Masaru Ito, Hiroshi Inoue and Kenjiro Taura, 'Fragmented BWT: Extended BWT for full-text indexing', International Symposium on String Processing and Information Retrieval (SPIRE 2016), Beppu, Japan, October 18-20, 2016. (Acceptance rate: 25/46 = 54.3%)
- Hiroshi Inoue, 'How SIMD Width Affects Energy Efficiency: A Case Study on Sorting', IEEE Symposium on Low-Power and High-Speed Chips (COOL Chips XIX), Yokohama, Japan, April 20-22, 2016. (Acceptance rate: 11/22 = 50.0%)
- Hiroshi Inoue, 'Efficient Tomographic Reconstruction For Commodity Processors with Limited Memory Bandwidth', The 2016 IEEE International Symposium on Biomedical Imaging (ISBI 2016), Prague, Czech Republic, April 13-16, 2016. pp 747-750. (Acceptance rate: 337/649 = 51.9%) photos
- Hiroshi Inoue and Kenjiro Taura, 'SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures', PVLDB 8(11), pp 1274-1285, presented in 41st International Conference on Very Large Data Bases (VLDB 2015). Kohala Coast, Hawaii, USA, August 31 - September 4, 2015. (Acceptance rate: 151/710 = 21.3%) photos
- Hiroshi Inoue, Moriyoshi Ohara and Kenjiro Taura, 'Faster Set Intersection with SIMD instructions by Reducing Branch Mispredictions', PVLDB 8(3), pp 293-304, presented in 41st International Conference on Very Large Data Bases (VLDB 2015). Kohala Coast, Hawaii, USA, August 31 - September 4, 2015. (Acceptance rate: 151/710 = 21.3%) photos
- Hiroshi Inoue and Toshio Nakatani, 'Adaptive SMT Control for More Responsive Web Applications', 2014 IEEE International Symposium on Workload Characterization (IISWC 2014). Raleigh, North Carolina, USA. October 26-28, 2014. pp 41-50. Presented in Best Paper Session.(Acceptance rate: 22/80 = 27.5%) photos
- Takuya Nakaike, Hiroshi Inoue, Toshio Suganuma, and Moriyoshi Ohara, 'Characterization of Call-Graph Profiles in Java Workloads', 2014 IEEE International Symposium on Workload Characterization (IISWC 2014). Raleigh, North Carolina, USA. pp 161-170. October 26-28, 2014. (Acceptance rate: 22/80 = 27.5%)
- Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu, and Toshio Nakatani, 'Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler', 2012 ACM Object-Oriented Programming, Systems, Languages & Applications (SPLASH/OOPSLA 2012). Tucson, Arizona, USA. pp 179-194, October 19-26, 2012. (Acceptance rate: 59/228 = 25.9%) photos
- Hiroshi Inoue and Toshio Nakatani, 'Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters', 2012 International Symposium on Memory Management (ISMM 2012). Beijing, China. pp 133-142. June 15-16, 2012. (Acceptance rate: 12/30 = 40.0%) photos
- Peng Wu, Hiroshige Hayashizaki, Hiroshi Inoue, and Toshio Nakatani, 'Reducing Trace Selection Footprint for Large-scale Java Applications with no Performance Loss', 2011 ACM Object-Oriented Programming, Systems, Languages & Applications (SPLASH/OOPSLA 2011). Portland, Oregon, USA. pp 789-804. October 22-27, 2011. (Acceptance rate: 61/166 = 36.7%)
- Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu, and Toshio Nakatani, 'A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler', 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2011). Chamonix, France. pp. 246-256, April 2-6, 2011. (Acceptance rate: 28/105 = 26.7%) photos
- Hiroshige Hayashizaki, Peng Wu, Hiroshi Inoue, Mauricio Serrano and Toshio Nakatani, 'Improving the Performance of Trace-based Systems by False Loop Filtering', Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2011). Newport Beach, California, USA. pp. 405-418, March 5-11, 2011. (Acceptance rate: 32/152 = 21.1%)
- Hiroshi Inoue and Toshio Nakatani, 'Performance of Multi-Process and Multi-Thread Processing on Multi-core SMT Processors', 2010 IEEE International Symposium on Workload Characterization (IISWC 2010). Atlanta, Georgia, USA. pp 209-218. December 2-4, 2010. (Acceptance rate: 21/56 = 37.5%) photos
- Hiroshi Inoue and Toshio Nakatani, 'How a Java VM Can Get More from a Hardware Performance Monitor'', ACM SIGPLAN 2009 International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2009). Orlando, Florida, USA. pp. 137-154. October 25-29, 2009. (Acceptance rate: 25/144 = 17.4%) photos
- Hiroshi Inoue, Hideaki Komatsu, and Toshio Nakatani, 'A Study of Memory Management for Web-based Applications on Multicore Processors', ACM SIGPLAN 2009 Conference on Programming Language Design and Implementation (PLDI 2009). Dublin, Ireland. pp. 386-396. June 15-20, 2009. (Acceptance rate: 41/196 = 20.9%) photos
- Hiroshi Inoue, Takao Moriyama, Hideaki Komatsu, and Toshio Nakatani, 'AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors', IEEE The Sixteenth International Conference on Parallel Architectures and Compilation Techniques (PACT 2007). Brasov, Romania. pp. 189-198. Sept. 15-19, 2007. (Acceptance rate: 34/175 = 19.4%)
- Jessica H. Tseng, Hao Yu, Shailabh Nagar, Niteesh Dubey, Hubertus Franke, Pratap Pattnaik, Hiroshi Inoue, and Toshio Nakatani, 'Performance Studies of Commercial Workloads on a Multi-core System', 2007 IEEE International Symposium on Workload Characterization (IISWC 2007), 2007.
- Moriyoshi Ohara, Hangu Yeo, Frank Savino, Giridharan Iyengar, Leiguang Gong, Hiroshi Inoue, Hideaki Komatsu, Vadim Sheinin, and Shahrokh Daijavad, 'Accelerating Mutual-Information-Based Linear Registration on the Cell Broadband Engine Processor', 2007 IEEE International Conference on Multimedia and Expo (ICME 2007), 2007.
- Moriyoshi Ohara, Hangu Yeo, Frank Savino, Giridharan Iyengar, Leiguang Gong, Hiroshi Inoue, Hideaki Komatsu, Vadim Sheinin, Shahrokh Daijavad, and Bradley Erickson, 'Real-time Mutual-information-based Linear Registration on the Cell Broadband Engine Processor', Fourth IEEE Symposium on Biomedical Imaging (ISBI 2007), 2007.
- Motohiro Kawahito, Hideaki Komatsu, Takao Moriyama, Hiroshi Inoue, and Toshio Nakatani, 'A New Idiom Recognition Framework for Exploiting Hardware-Assist Instructions', Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII), 2006.
- Hiroshi Inoue, Yohei Sato, and Koichi Hishida, 'Directional Scale Dependency on Force Coupling for Dispersed Two-Phase Turbulent Flows', Second International Symposium on Turbulence and Shear Flow Phenomena (TSFP2), 2001.
Journal Papers
- Motohiro Kawahito, Hideaki Komatsu, Takao Moriyama, Hiroshi Inoue, and Toshio Nakatani, 'Idiom Recognition Framework using Topological Embedding', ACM Transactions on Architecture and Code Optimization (TACO), Vol. 10(3), 2013.
- Hiroshi Inoue, Takao Moriyama, Hideaki Komatsu, and Toshio Nakatani, 'A high-performance sorting algorithm for multicore single-instruction multiple-data processors', Software: Practice and Experience, Vol. 42(6), pp. 753-777, 2012.
- Hiroshi Inoue, Hideaki Komatsu, Toshio Nakatani, 'Accelerating UTF-8 Decoding Using SIMD Instructions', Information Processing Society of Japan Transactions on Programming, Vol.1 (2), pp. 1-8, 2008. (in Japanese)
- Moriyoshi Ohara, Hiroshi Inoue, Yukihiko Sohda, Hideaki Komatsu, and Toshio Nakatani, MPI microtask for programming the Cell Broadband Engine processor, IBM Systems Journal, Vol 45 (1), pp. 85-102, 2006.
- Hiroshi Inoue, Takao Moriyama, Hideaki Komatsu, Toshio Nakatani, 'A fast sorting algorithm for VMX instruction set', Information Processing Society of Japan Transactions on Advanced Computing Systems, Vol 47 (ACS 14), pp. 105-114, 2006. (in Japanese)
- Hiroshi Inoue, Takao Moriyama, Yasushi Negishi, Moriyoshi Ohara, 'CPU resource reservation system for CPU using Simultaneous Multi Thread', Information Processing Society of Japan Transactions on Advanced Computing Systems, Vol.45 (ACS 5), pp. 21-28, 2004. (in Japanese)
Invited talk
- Hiroshi Inoue, 'Design and implementation of software stack for RaPiD AI Accelerator', IEICE-CPSY/DC, IPSJ-ARC Hot SPring Annual meeting 2021 (HotSPA2021). 2021.
- Hiroshi Inoue, Efficient Optimization of Diameter and Average Shortest Path Length of a Graph using Path Count Index, Graph Golf Workshop at The Fourth International Symposium on Computing and Networking (CANDAR'16). Hiroshima, Japan, 2016.
- Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu, and Toshio Nakatani, Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler, JSSST 2013, Tokyo, Japan, September 10-13, 2013.
- Hiroshi Inoue, 'A Trace-based Java JIT Compiler for Large-scale Applications', The 6th workshop on Virtual Machines and Intermediate Languages (VMIL2012). Tucson, Arizona, USA. 2012.
Workshop papers
- Peng Wu, Hiroshige Hayashizaki, Hiroshi Inoue, and Toshio Nakatani, 'Reducing Trace Selection Footprint for Large-scale Java Applications without Performance Loss', 10th Workshop on Compiler-Driven Performance, 2011.
- Peng Wu, Hiroshige Hayashizaki, and Hiroshi Inoue, 'Understand the Building Blocks of Trace Selection for a Trace-driven Language Compiler', 9th Workshop on Compiler-Driven Performance, 2010.
- Hiroshi Inoue, Hideaki Komatsu, Toshio Nakatani, Accelerating UTF-8 Decoding Using SIMD Instructions, IPSJ SIG Programming 68, Mar 17-18, 2008. (IPSJ Yamashita SIG Research Award)
- Moriyoshi Ohara, Hangu Yeo, Frank Savino, Giridharan Iyengar, Leiguang Gong, Hiroshi Inoue, Hideaki Komatsu, Vadim Sheinin, and Shahrokh Daijavad, 'Accelerating medical image registration on the Cell broadband engine processor', Second Workshop on Real Time and Interactive Digital Media Supercomputing (RIDMS-2), 2007.
- Daniel Citron, Hiroshi Inoue, Takao Moriyama, Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani, Exploiting the AltiVec Unit for Commercial Applications, Workshop on Computer Architecture Evaluation using Commercial Workloads, 2006.
Other publications
- Hiroshi Inoue and Tabari Alexander, 'Exploiting on-Chip AI Accelerator for High-Performance LLM Inference', PyTorch conference 2024, San Francisco, California, USA, Sept 18-19, 2024.
- Hiroshi Inoue, 'Multi-step LRU: SIMD-based Cache Replacement for Lower Overhead and Higher Precision', arXiv.09981 [cs.NI], 2021
- Swagath Venkataramani, Xiao Sun, Naigang Wang, Chia-yu Chen, Jungwook Choi, Mingu Kang, Ankur Agarwal, Jinwook Oh, Shubham Jain, Tina Babinsky, Nianzheng Cao, Thomas Fox, Bruce Fleischer, George Gristede, Michael Guillorn, Howard Haynie, Hiroshi Inoue, Kazuaki Ishizaki, Michael Klaiber, Shih-hsien Lo, Gary Maier, Silvia Mueller, Michael Scheuermann, Eri Ogawa, Marcel Schaal, Mauricio Serrano, Joel Silberman, Christos Vezyrtzis, Wei Wang, Fanchieh Yee, Jintao Zhang, Matthew Ziegler, Ching Zhou, Moriyoshi Ohara, Pong-fei Lu, Life Brian Curran, Sunil Shukla, Vijayalakshmi Srinivasan, Leland Chang, And Kailash Gopalakrishnan, 'Efficient AI System Design With Cross-Layer Approximate Computing', Proceedings of the IEEE, 2020
- Swagath Venkataramani, Jungwook Choi, Vijayalakshmi Srinivasan, Wei Wang, Jintao Zhang, Marcel Schaal, Mauricio Serrano, Kazuaki Ishizaki, Hiroshi Inoue, Eri Ogawa, Moriyoshi Ohara, Leland Chang, Kailash Gopalakrishnan, 'DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI Accelerator', IEEE Micro, 2019
- Hiroshi Inoue, 'Multi-Sample Dropout for Accelerated Training and Better Generalization',  arXiv.09788 [cs.NE], 2019
- Hiroshi Inoue, 'Data Augmentation by Pairing Samples for Images Classification',  arXiv.02929 [cs.LG], 2018
- Masaru Ito, Hiroshi Inoue, Megumi Ito and Moriyoshi Ohara, 'FBWTMEM : computing maximal exact matches with FBWT', IBM Research Report RT0981, 2017.
- Hiroshi Inoue, 'Bring Apache Spark Closer to Accelerators', Workshop on Recent Topics in High Performance Computing (HPCAsia PC Workshop), 2017
- Hiroshi Inoue, 'Fast and Accurate Inference with Adaptive Ensemble Prediction in Image Classification with Deep Neural Networks', IBM Research Report RT0978, also arXiv.08259 [cs.LG], 2017
- Blog post at Spark.tc, Gita Koblents, Kazuaki Ishizaki, Hiroshi Inoue, 'Bringing Apache Spark Closer to SIMD and GPU', http://www.spark.tc/simd-and-gpu/, 2016
- IBM Whitepaper, 'Real-Time Mutual-Information-Based Linear Registration on the Cell Broadband Engine Processor'(http://www-03.ibm.com/press/us/en/attachment/23251.wss?fileId=ATTACH_FILE2&fileName=cell-reg.pdf), 2007.
- Hiroshi Inoue, Takao Moriyama, Yasushi Negishi, and Moriyoshi Ohara, 'CPU Resource Reservation for Simultaneous Multi-Thread Systems', IBM Research Report, 2006.
Open source contributions
- Apache Spark, LLVM etc. (github)
Programming contest
- ACM ICFP programming contest 2013, 8th place in the main division and 5th place in the lightning division.
 
About my name: My name is so common in Japan that the DBLP entry for 'Hiroshi Inoue' includes many papers that are not related to me. The author page in Google Scholar is maintained.
Awards and Honors
Jan. 2016, Outstanding Research Award, ACSI 2016
Jan. 2015, Outstanding Research Award, ACSI 2015
Jul. 2008, Yamashita SIG Research Award, Information Processing Society of Japan
Mar. 2000, Graduate with Honors, Keio University
Professional Activities (international conferences/journals)
- PPoPP 2026 PC member
- IEEE International Conference on Data Mining 2025 PC member
- IEEE Big Data 2025 PC member
- IEEE Big Data 2024 PC member
- IEEE Big Data 2023 PC member
- IEEE ICMLA 2023 PC member
- CGO 2023 PC member
- IEEE Big Data 2022 PC member
- IEEE ICMLA 2022 PC member
- IEEE Big Data 2021 PC member
- IEEE ICMLA 2021 PC member
- IEEE Big Data 2020 PC member
- IJCAI 2020 PC member
- IEEE Big Data 2019 PC member
- IJCAI 2019 PC member
- HPCAsia 2019 PC member
- CGO 2019 PC member
- IEEE Big Data 2018 PC member
- IEEE Cluster 2018 PC member (Programming and System Software area)
- HPCAsia 2018 PC member
- IEEE Big Data 2017 PC member
- IEEE Cluster 2017 PC member (Programming and System Software area)
- ICS 2016 PC member photos
- SAC 2016 Programming Languages track PC member
- IISWC 2015 PC member photos
- SAC 2015 Programming Languages track PC member
- PLDI 2015 External review committee member
- SAC 2014 Programming Languages track PC member
- SC13 PC member (programming systems track) photos
- PLDI 2013 External review committee member
- ASPLOS 2013 External review committee member
- Reviewer for ACM TACO, ACM TECS, ACM TOMS, ACM JETC, IEEE TC, IEEE TPDS, VLDB Journal, Software: Practice and Experience etc.
Professional Activities (domestic conferences/journals)
- IEICE Transactions on Information and Systems, Guest associate editor on Special Section on Forefront Computing (2026)
- xSIG2025 Organizing Committee member
- IEICE Transactions on Information and Systems, Guest associate editor on Special Section on Forefront Computing (2025)
- xSIG2024 Organizing Committee member
- IEICE Transactions on Information and Systems, Guest associate editor on Special Section on Forefront Computing (2024)
- xSIG2023 Organizing Chair
- IEICE Transactions on Information and Systems, Guest associate editor on Special Section on Forefront Computing (2023)
- xSIG2022 Organizing Committee member
- IEICE Transactions on Information and Systems, Guest associate editor on Special Section on Forefront Computing (2022)
- IEICE Transactions on Information and Systems, Guest associate editor on Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking (2021)
- xSIG 2020 PC chair
- IEICE Transactions on Information and Systems, Guest associate editor on Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking (2020)
- xSIG 2019 PC member
- IEICE Transactions on Information and Systems, Guest associate editor on Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking (2019)
- xSIG 2018 PC member
- IEICE Transactions on Information and Systems, Guest associate editor on Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking (2018)
- xSIG 2017 PC member
- ACSI 2016 PC member (programming and systems software track)
- ACSI 2015 PC member (programming and systems software track)
- SACSIS 2013 PC member and Track chair (programming and systems software track)
- Editor of IPSJ Magazine (Information Processing Society of Japan, April 2011 - Mar. 2015 )
Society Governance
- Board member of JSSST (Japan Society for Software Science and Technology) 2019 June - 2023 June
Teaching Activities
- Lecture for undergraduate students at Keio University (June 2024) on software stack for a deep learning accelerator
- Adjunct lecturer at Tokyo Institute of Technology (November 2022 - February 2023)
- Lecture at University of Tokyo (May 2022)
- Lecture at Keio University (June 2021) on software stack for a deep learning accelerator
- Lecture at University of Tokyo (July 2020)
- Adjunct lecturer at Tokyo Institute of Technology (November 2017 - February 2018)
- Adjunct lecturer at Tokyo Institute of Technology (December 2014 - March 2015)
Profile
Mar. 2000
Bachelor in System Design Engineering, Keio University, Japan
Bachelor thesis: Direct Numerical Simulation of Dispersed Two-Phase Turbulent Flows With Evaporating Droplets
Advisor: Prof. Koichi Hishida
Mar. 2002
Master in System Design Engineering, Keio University, Japan
Master thesis: Turbulence Modification by Dispersed Particles in Two-Phase Turbulent Flows
Advisor: Prof. Koichi Hishida
Mar. 2016
Ph.D. in Information Science and Technology, University of Tokyo, Japan
Ph.D thesis: Efficient Exploitation of SIMD Instructions in Non-Numerical Applications
Advisor: Prof. Kenjiro Taura
Apr. 2002 - present
Researcher at IBM Research
Personal
Photo taking (TRL Camera club), Playing futsal
National qualification of weather forecaster