AboutCategoriesAdvisorsTalksPapersInterests

Work Experiences

2024   A baby and a PhD 2023   Taking a break from NLP Research 2022   You don't feel like you're good enough, but its not a competition 2021   You're not doing well, but motivation is optional 2018   From psychologist to CS PhD Student 2017   Communicating Data Science 2017   Cross disciplinary projects 2015   DSO Advice

Bayesian Inference

2022   Stochastic Gradient Langevin Dynamics 2022   Minimum Bayes Risk Decoding 2019   Coordinate Ascent Mean-field Variational Inference (Univariate Gaussian Example) 2019   Dirichlet Process Gaussian Mixture Models (Generation) 2017   Gibbs Sampling on Dirichlet Multinomial Naive Bayes (Text) 2017   Markov Chain Monte-Carlo 2017   EM Algorithm for Gaussian mixtures 2017   Conjugate Priors 2017   Closed form Bayesian Inference for Binomial distributions

Misc

2024   Vibe Coding Car Racing Simulator (Fail) 2024   First 100 words 2022   Could Large Language Models be conscious? (David Chalmers @ Neurips 2022) 2022   NYCMidnight-100words 2020   Adversarial NLP examples with Fast Gradient Sign Method 2019   Modes of Convergence 2018   Algorithms on Graphs: Fastest Route

Machine Learning

2023   Training Sparse Neural Networks with L0 Regularisation 2022   Stochastic Gradient Langevin Dynamics 2021   Formalising Analogies for A.I 2021   Likelihood weighted Sequential Importance Sampling 2020   Some QA from Deep Learning (CS 462/482) 2020   Variance of the Estimator in Machine Learning 2020   The Sigmoid in Regression, Neural Network Activation and LSTM Gates 2019   Coordinate Ascent Mean-field Variational Inference (Univariate Gaussian Example) 2018   Onboarding for Practical Machine Learning Research 2018   Jacobian, Chain rule and backpropagation 2018   Gradients, partial derivatives, directional derivatives, and gradient descent 2018   Calculus for Machine Learning

Calculus

2018   PyTorch Automatic differentiation for non-scalar variables; Reconstructing the Jacobian 2018   Lagrange Multipliers and Constrained Optimization 2018   Taylor Series approximation, newton's method and optimization 2018   Hessian, second order derivatives, convexity, and saddle points 2018   Jacobian, Chain rule and backpropagation 2018   Gradients, partial derivatives, directional derivatives, and gradient descent 2018   Derivatives, differentiability and loss functions 2018   Calculus for Machine Learning

Optimization

2022   Stochastic Gradient Langevin Dynamics 2018   Equivalence of constrained and unconstrained form for Ridge Regression 2018   Lagrange Multipliers and Constrained Optimization 2018   Taylor Series approximation, newton's method and optimization 2018   Gradients, partial derivatives, directional derivatives, and gradient descent

Code

2025   Dynamic Batching for Training Large Sequence Models (LLMs) 2024   Vibe Coding Car Racing Simulator (Fail) 2024   Chunking code for RAG; parsing-recursion-stack 2024   Python Decorators for Monitoring GPU Usage 2023   Monitoring Jobs on the Server 2023   Lean OmegaConf Argparse System 2022   Recipe for connecting to Google Drive from Remote Server 2020   A minimum keystroke (py)Debugger for Lazy ML/DS people who don't IDE 2020   Recipe for building jq from source without admin(sudo) rights 2018   Gotchas in Cython; Handling numpy arrays in cython class 2018   Migrating from python 2.7 to python 3 (and maintaining compatibility)

Projects

2025   RAG System Architecture 2024   Data Extraction for Unstructured Document Data 2020   A minimum keystroke (py)Debugger for Lazy ML/DS people who don't IDE 2018   Studying drug-drug interactions and predictors of adverse vascular outcomes 2018   Capturing Last-mile Transactions of Smallholder Palm Oil Farmers

PyTorch

2025   Dynamic Batching for Training Large Sequence Models (LLMs) 2024   Python Decorators for Monitoring GPU Usage 2019   Clean TreeLSTMs implementation in PyTorch using NLTK treepositions and Easy-First Parsing 2019   Pad pack sequences for Pytorch batch processing with DataLoader 2018   PyTorch Automatic differentiation for non-scalar variables; Reconstructing the Jacobian

Review

2022   NLP Papers at ICML2022 2021   Neural Tangent Kernel, Every Model trained by GD is a kernel machine (Review) 2020   NEURIPS 2020 2020   EMNLP 2020 2020   Some Clustering Papers at ICLR20 2019   Arithmetic(Book)

Machine Translation

2022   Minimum Bayes Risk Decoding

Reinforcement Learning

2024   Deriving the Basic Policy Gradient Update (REINFORCE) 2024   Temporal Difference Learning: Taking advantage of Incomplete Trajectories 2023   Dynamic Programming for Reinforcement Learning, the importance of the Bellman equations; (with Gymnasium)

Compression

2023   Training Sparse Neural Networks with L0 Regularisation

Generative Models

2024   Deriving the minimax equation for GANs 2024   A classical NLP researcher and a GPT-era Engineer meet at the coffee machine 2023   LLM Research and Adaptation Landscape

NLP

2024   Synthetic Question Generation for Retrieval Evaluation of RAG
  • Quality means doing it right when no one is looking - Henry Ford
  • suzyahyah
  • suzyahyah

The best time to plant a tree was 20 years ago. The second best time is now. - Japanese proverb

Since October 2017