Designed a five-stage pipelined RISC-V CPU supporting RV32I instructions, including instruction fetch, decode, execute, memory access, and write-back stages
Implemented data hazard handling via forwarding logic and stall control, improving pipeline throughput
Integrated a branch prediction module to reduce control hazard penalties and increase IPC
Wrote synthesizable RTL in Verilog and performed simulation using ModelSim for unit and integration-level testing
Validated ISA compliance using the RISC-V architectural test suite
Collaborated with team members to integrate the CPU into a custom SoC with I/O peripherals and memory controllers
Performed FPGA-based validation on a Xilinx Artix-7 board, including clock domain integration and UART output
Used Synopsys Design Compiler to synthesize gate-level netlist and analyze timing
Documented microarchitecture decisions, pipeline diagrams, and test results for project handoff
Participated in weekly code reviews to improve design robustness and maintain consistent coding standards
Designed a matrix multiplication engine based on systolic array topology for accelerating deep learning workloads
Mapped convolution and dense layers from ResNet-50 onto the accelerator using a custom compiler
Designed dataflow scheduling logic to support input reuse and partial result accumulation
Optimized on-chip memory allocation and buffering to reduce off-chip memory bandwidth
Implemented accelerator control logic in Verilog and integrated it into a hardware-software co-simulation environment
Conducted power and performance profiling using Synopsys PrimeTime PX, achieving a 3× energy efficiency improvement
Verified design using functional simulation, constrained-random stimulus, and assertion-based checks
Deployed full system on FPGA using Vivado, tested inference performance with quantized weights
Coordinated with software engineers to build Python API and runtime driver for accelerator control
Collaborated with team members to benchmark performance against NVIDIA Jetson and Google Edge TPU
Implemented a wormhole-based NoC protocol supporting credit-based flow control and virtual channels
Compared various topologies (mesh, torus, and ring) under different traffic loads using SystemC simulations
Designed router arbitration and switching fabric to reduce head-of-line blocking and increase throughput
Conducted RTL implementation and synthesis of NoC router components using Verilog and Design Compiler
Analyzed latency and bandwidth metrics under uniform random, hotspot, and bursty traffic patterns
Developed a modular NoC testbench with randomized packet generators and latency monitors
Integrated NoC with a multi-core RISC-V SoC design, managing interface protocols and clock synchronization
Performed post-synthesis timing analysis and floorplan-aware optimization
Collaborated with SoC integration team to validate inter-core communication and coherency protocols
Participated in weekly architecture review meetings and contributed to NoC performance modeling
Architected ISO 26262 ASIL-D compliant vehicle network gateway supporting CAN FD/Ethernet TSN protocols, implementing hardware-enforced firewall rules with configurable message filtering matrices.
Developed UVM verification environment with fault injection capabilities using Synopsys VIP for Automotive, simulating electromagnetic interference (EMI) scenarios and bus contention errors.
Designed dual-core lockstep ARM Cortex-R52 subsystem with cycle-accurate redundancy checkers, achieving 99.999% diagnostic coverage through formal property verification with Cadence JasperGold.
Implemented secure over-the-air (OTA) update mechanism with cryptographic signature verification, integrating NIST-approved SHA-3 accelerators and AES-GCM engines.
Built Python-based traffic generator emulating real-world vehicle network patterns, validating worst-case latency requirements under 250μs for brake-by-wire systems.
Deployed CI/CD pipeline using GitLab runners with automated coverage merging, enabling nightly regression across 500+ test scenarios on AWS EC2 FPGA instances.
Created automated documentation generator linking requirements (DOORS Next) to verification status, ensuring traceability for ISO 21434 cybersecurity certification.
Designed digital front-end for 28GHz phased array system, implementing complex matrix operations for 256-element beamforming using fixed-point arithmetic optimization.
Developed UVM testbench with MATLAB 5G Toolbox integration, validating 3GPP NR FR2 waveforms against RTL through DPI-C accelerated data paths.
Implemented error injection framework for analog-to-digital interface (JESD204B), simulating phase noise and quantization errors using Synopsys HAPS prototyping.
Created parameterized Verilog generator for butterfly network topologies, enabling rapid reconfiguration between 4x4 and 8x8 MIMO configurations.
Built power-aware validation suite using UPF 2.1, analyzing thermal throttling impacts on beam direction accuracy through PowerArtist simulations.
Deployed machine learning-based test selection algorithm using XGBoost, reducing regression runtime by prioritizing high-impact coverage holes.
Integrated ARM Cortex-M7 control plane with custom SIMD extensions for real-time beam weight calculations.
