Developed the pipeline for 3D reconstruction of experimental motif by Neural Radiance Field (NeRF)
Extended NeRF to image data with different imaging processes by designing a virtual camera with an end-to-end differentiable rendering process; adapted dynamic structure by an angle perturbation module
Improved the resolution and efficiency of image rendering by Instant-NGPup to 10 times faster
Designed a U-Net architecture with self-attention mechanism to achieve better context awareness
Achieved 67% and 65% of mean loU and F1-score respectively for the KITTI dataset
Developed an end-to-end pipeline that extracted visual feature tracks from temporal keyframes, performed 6DOF pose estimation and 3D mapping on COLMAP, and aligned a query frame with previously captured frames
Developed a low-cost process to guide the alignment with a previously selected/captured scene by gyroscope measure-ments and RANSAC-filtered visual feature tracks
Performed data crawling using Python to collect 114687 images and trained, built a Faster-RCNN with Wider Face dataset to detect and extract 94% faces using PyTorch.
Trained and applied a SRGAN model to augment and standardize picture resolution from less than 100x100 to 256x256 using PyTorch.
Implemented, trained and applied an 8-layer StyleGAN with PyTorch framework to generate dynamic style of cartoon faces according to input image and commands.
Added effects with mixing the styles of the generated images by controlling the latent model spaces.
Implemented a diffusion model (ddpm) and trained with 10881 face images.
Constructed GPU infrastructure and implemented parallel computation to speed up 12x run-time performance using C++.
Established an NLP system to summarize the text through TensorFlow in the environment built by Nvidia Docker.
Analyzed and processed 2TB text summarization datasets from THUC news, LCSTS, CSL news headlines, contexts and judicial summaries by NLTK.
Benchmarked the performance of Point-Generator, WoBERT, Nezha, and T5 on the above datasets to obtain proper title/abstract, guideline and summaries.
Applied BERT, WoBERT to improve prediction accuracy by 5.6% and 4.5% respectively.
Expedited the inference of these transformer-based models by 1.55x faster via Turbo Transformer.
Accelerated the modeling run-time performance by 8x with parallel computation using CUDA based on GPUs
Applied the Bert-of-Theseus method to distill WoBERT, shrank model size to 50%.
Designed algorithms to automate ML/DL workloads tuning for improved inference speed during the compilation stage
Proposed a novel method for the dataflow analysis of workloads with hardware-ISA-specific instructions in C++, enhancing 70% extraction speed of graph-level and assembly-level embeddings.
Devised a neural network training system with JSON database for cost modeling on workloads and implemented a distributed, asynchronous evaluation pipeline for tensor programs in Golang, achieving 20x speedup.
Gathered a dataset of over 1M tuning candidates from 114 popular deep learning models, paving way for future research.
Boosted the learning-driven framework MetaSchedule and automated the pipeline of TVM compiler stack.
Developed a universal pipeline enabling seamless deployment of LLMs on various hardware backends
Implemented the C++ backend and applied kernel optimization on models, achieving up to 4x speed up over llama.cpp.
Launched a HuggingFace-like concise API for model loading and inference in Python and packaged in Pip/Conda wheels with Docker images.
Devised a Siri-like iOS app supporting multimodal, instant question-answering chatbot using Objective C and Swift.
Built an interactive Gradio frontend for chat visualization and a WebGPU-based chat runtime using TypeScript.