Education
Xi'an Jiaotong University
CS Undergrad, 2019.09 ~ 2023.07 (expected)Computer Science Undergraduate
GPA (first 3 years): 91.278/100, rank 4/171 (Top 2.4%)
Papers & Patents
- Tian Xia, Gelin Fu, Chenyang Li, Zhongpei Luo, Wenzhe Zhao, Nanning Zheng and Pengju Ren*, "A Comprehensive Performance Model of Sparse Matrix-Vector Multiplication to Guide Kernel Optimization", IEEE Transactions on Parallel and Distributed Systems. 2022. (CCF A)
- Chenyang Li, Tian Xia*, Wenzhe Zhao, Nanning Zheng and Pengju Ren, "SpV8: Pursuing Optimal Vectorization and Regular Computation Pattern in SpMV", The 58th Annual ACM/IEEE Design Automation Conference (CCF A)
- Haoran Zhao, Tian Xia, Chenyang Li, Wenzhe Zhao, Nanning Zheng and Pengju Ren*. Exploring Better Speculation and Data Locality in Sparse Matrix-Vector Multiplication on Intel Xeon. ICCD 2020 (CCF B)
- 李辰洋. 专利:一种新闻聚合与智能实体关联方法(已授权) (China Patent, Granted)
Internship
Wangxuan Institute of Computer Technology, Peking University
Research Intern in System Security, 2022.10 ~ nowConduct research on DMA security, under supervision of Prof. Xinhui Han.
NVIDIA Global Performance Lab, Shanghai
Software Development & Embedded System Intern, 2022.07 ~ 2023.1Help bring-up an internal project from prototype to production-ready.
Contributed independently to Web frontend and backend, video encoding/streaming pipeline, Linux driver customization, Yocto-based distro building, OTA upgrade, and 3D Case Design.
Network and Information Security Lab, Tsinghua University
Research Intern in System Security, 2021.12 ~ nowConduct research on hardware-software co-design security, under the supervision of Prof. Chao Zhang.
Institute of AI and Robotics, Xi'an Jiaotong University
Research Intern in Computer Architecture, 2019.09 ~ 2022.6Lead research on SpMV performance optimization, under the co-supervision of Prof. Pengju Ren and Prof. Tian Xia.
This work is published on DAC 2021 (CCF A, 1st author) and ICCD 2020. And we has achieved 2.4x speedup compared with Intel MKL.
SpMV is a frequently-invoked routine in the field of HPC. But its computation features high irregularity, resulting a low efficiency on commodity hardware. During the research, I combined lots of strategies to improve the performance: SIMD, Multicore, Cache Locality, Explicit Prefetch, Row-column Hybrid Format, NUMA and Work-stealing.
Projects
rv5stage, a RV32I core with cache coherence support
Computer Organization Course Project (solo, A+), 2021.11- 6-staged vendor-agnostic, FPGA-optimized pipeline. It runs at ~70MHz on XC7A100T-1
- Seperate direct-mapped I/D Cache
- Simulate-everything with verilator to avoid FPGA hell
- Coherence support based on a homemade bus and SI protocol
- UART controller @ 115200 Baud
HougeOS, a multicore bootloader on RPi4
Operating System Course Project (solo, A+), 2021.11- It can boot from RasPi's close-sourced firmware.
- I wrote a concurrent primitive library for the kernel, based on ARM-v8.0A LL/SC mechanism.
- During the project, I found an early-stage memory attribute misconfiguration bug of ChCore (an educational OS from IPADS, SJTU) with OpenOCD JTAG debugging. This bug prevents the core from booting on real board. It is confirmed by Prof. Yubin Xia and Prof. Jinyu Gu from IPADS.
MIT 6.S081 (6.828) 2019 RISC-V Lab
Self-taught, 2019.10I finished all the lab assignments in the course. And I got 41 star on Github.
Skills
Languages
- Chinese (native)
- English (academic communication), CET-4 574, CET-6 560
Programming Languages
Order by proficiencySoftware: Python 3 (2012), C (2006), Java (2007), C++ (2006), JavaScript (2010), Nix (2022), CUDA (2014), PHP (2012)
Hardware: SIMD Intrinsic (AVX512 & SVE) (2019), SystemVerilog (2020), SpinalHDL (2021), Chisel (2020)
Tools
Software: GDB, Qemu, Linux Perf, Intel PT/PEBS, Intel VTune
Hardware: Gem5, Vivado, Verilator, Cadence Allegro, Cadence OrCAD, OpenOCD
Mechanical: OpenSCAD, Autodesk Inventor