I’m currently a Machine Learning Engineer, 3D at Stability AI, doing R&D work involving generative 3D models such as SF3D. Before that, I worked as a Research Engineer at AMD Japan, doing R&D work involving path tracing with neural rendering and contributed towards new HW features for these workloads. I completed my master’s from the University of Tokyo, supervised by Prof. Toshiya Hachisuka and Prof. Nobuyuki Umetani. My research interests include light transport simulation, real-time rendering, neural rendering and machine learning.
Masters in Information Science and Technology (Creative Informatics), 2021
The University of Tokyo
BE in Computer Engineering, 2016
Pune Institute of Computer Technology
R&D in Ray Tracing and Machine Learning for Neural Rendering.
Influenced multiple future hardware architectures on neural rendering by developing forward-looking workloads and evaluating them under different hardware constraints.
Developed an RDNA3 AI accelerated demo in 2.5 weeks, shown off at the Las Vegas launch event on November 3rd, using optimized fully fused MLP inference kernels and grid encoding. Inference kernels written from scratch using WMMA instructions for RDNA3 architecture. Demonstrated a 2.7x inference performance improvement on the 7900 XTX compared to the 6950 XT. Received an award by Rick Bergman for this effort.
Wrote a blog on how to use RDNA3’s WMMA instructions with sample code - https://gpuopen.com/learn/wmma_on_rdna3/ - well received both internally and externally.
Re-implemented instant-ngp (Instant NeRF) from scratch, which includes fully-fused MLP kernels with WMMA optimizations (tensor core usage), grid encoding and occupancy grids for in-house research work with competitive performance.
Ported tiny-cuda-nn and instant-ngp to HIP, supporting RDNA3 and MI GPUs, utilizing the WMMA ops of RDNA3 and matrix cores of MI.
Wrote a single .exe prototype for running stable diffusion on AMD GPUs using SHARK from nod.ai. Prototype adopted and used extensively in production today. Collaborated with nod.ai to get it running on RDNA3. Won the RTG Next 5% award from David Wang for this work.
Enabled Windows support of the ROCm backend for nod.ai SHARK (later acquired by AMD). Demonstrated the viability of the ROCm backend on Windows as a more flexible alternative to the Vulkan backend. Received an executive spotlight award for this work.
Enabled HIP support for llamafile and contributed towards RDNA3 compatibility for llama.cpp on Windows.
Also worked on Radeon™ ProRender - Delivered massive CPU performance improvements and developed a HIP based backend for AMD GPUs, with an initial HIP port from OpenCL done within 48 hours.
During my stint at the Ecosystem Services Department, I helped design and ship global scale web apps around the major ID platforms. I also worked horizontally across all ecosystem services (Points, Payments, Membership) to enhance technical quality across the board.
Notable projects:
Design, develop and deploy a global-scale ID service layer using multi-regional kubernetes cluster with anycast routing using GCP. Service currently used by multiple Rakuten services across the globe, supporting hundreds of millions MAU.
Ebates+ID integration: Helped speed up production readiness for integration of the new ID platform with Ebates. Deployed core components closer to the ID data-centers in GCP for a massive decrease in login latency using HAProxy for fault tolerance.
GCP integration with Rakuten’s Data centers: As a consultant to the cloud and internal network teams, I helped setup and integrate GCP’s cloud interconnect to all ESD projects which required connectivity to on-premise resources.
Roles and responsibilities as architect:
Cloud architect: Enforcing governance and principles of least privilege for all cloud users within the department; introduced real-time billing updates to senior management and also helped decrease yearly cloud bills by ~¥12 million. Introduced best practices for cloud security and consulted teams on optimal GCP usage and best practices.
Release reviews: Lead weekly release reviews of Points, Payments and Membership sections. Enforced the 3 pillars of a successful release - safe, repeatable and stress-free. This helped reduce release troubles and increase fully automated releases.
Mentorship: Mentored new engineers with their on-boarding and training, and organised training projects for interns from various Canadian universities as part of their co-op program. The training introduced them to new technologies, internal and external, and platforms including Docker, Kubernetes, Kotlin etc. as a way to help them get up to speed with ESD’s existing projects.