Categories

Work Experiences

2025 Middle Aged Work Advice 2024 A baby and a PhD 2023 Taking a break from NLP Research 2022 You don't feel like you're good enough, but its not a competition 2021 You're not doing well, but motivation is optional 2018 From psychologist to CS PhD Student 2017 Communicating Data Science 2017 Cross disciplinary projects 2015 DSO Advice

Bayesian Inference

2022 Stochastic Gradient Langevin Dynamics 2022 Minimum Bayes Risk Decoding 2019 Coordinate Ascent Mean-field Variational Inference (Univariate Gaussian Example) 2019 Dirichlet Process Gaussian Mixture Models (Generation) 2017 Gibbs Sampling on Dirichlet Multinomial Naive Bayes (Text) 2017 Markov Chain Monte-Carlo 2017 EM Algorithm for Gaussian mixtures 2017 Conjugate Priors 2017 Closed form Bayesian Inference for Binomial distributions

Misc

2024 Vibe Coding Car Racing Simulator (Fail) 2024 First 100 words 2022 Could Large Language Models be conscious? (David Chalmers @ Neurips 2022) 2022 NYCMidnight-100words 2020 Adversarial NLP examples with Fast Gradient Sign Method 2019 Modes of Convergence 2018 Algorithms on Graphs: Fastest Route

Machine Learning

2023 Training Sparse Neural Networks with L0 Regularisation 2022 Stochastic Gradient Langevin Dynamics 2021 Formalising Analogies for A.I 2021 Likelihood weighted Sequential Importance Sampling 2020 Some QA from Deep Learning (CS 462/482) 2020 Variance of the Estimator in Machine Learning 2020 The Sigmoid in Regression, Neural Network Activation and LSTM Gates 2019 Coordinate Ascent Mean-field Variational Inference (Univariate Gaussian Example) 2018 Onboarding for Practical Machine Learning Research 2018 Jacobian, Chain rule and backpropagation 2018 Gradients, partial derivatives, directional derivatives, and gradient descent 2018 Calculus for Machine Learning

Calculus

2018 PyTorch Automatic differentiation for non-scalar variables; Reconstructing the Jacobian 2018 Lagrange Multipliers and Constrained Optimization 2018 Taylor Series approximation, newton's method and optimization 2018 Hessian, second order derivatives, convexity, and saddle points 2018 Jacobian, Chain rule and backpropagation 2018 Gradients, partial derivatives, directional derivatives, and gradient descent 2018 Derivatives, differentiability and loss functions 2018 Calculus for Machine Learning

Optimization

2022 Stochastic Gradient Langevin Dynamics 2018 Equivalence of constrained and unconstrained form for Ridge Regression 2018 Lagrange Multipliers and Constrained Optimization 2018 Taylor Series approximation, newton's method and optimization 2018 Gradients, partial derivatives, directional derivatives, and gradient descent

Code

2025 Reproducing GPT2-125M 2025 Dynamic Batching for Training Large Sequence Models (LLMs) 2024 Vibe Coding Car Racing Simulator (Fail) 2024 Chunking code for RAG; parsing-recursion-stack 2024 Python Decorators for Monitoring GPU Usage 2023 Monitoring Jobs on the Server 2023 Lean OmegaConf Argparse System 2022 Recipe for connecting to Google Drive from Remote Server 2020 A minimum keystroke (py)Debugger for Lazy ML/DS people who don't IDE 2020 Recipe for building jq from source without admin(sudo) rights 2018 Gotchas in Cython; Handling numpy arrays in cython class 2018 Migrating from python 2.7 to python 3 (and maintaining compatibility)

Projects

2025 RAG System Architecture 2024 Data Extraction for Unstructured Document Data 2020 A minimum keystroke (py)Debugger for Lazy ML/DS people who don't IDE 2018 Studying drug-drug interactions and predictors of adverse vascular outcomes 2018 Capturing Last-mile Transactions of Smallholder Palm Oil Farmers

PyTorch

2025 Dynamic Batching for Training Large Sequence Models (LLMs) 2024 Python Decorators for Monitoring GPU Usage 2019 Clean TreeLSTMs implementation in PyTorch using NLTK treepositions and Easy-First Parsing 2019 Pad pack sequences for Pytorch batch processing with DataLoader 2018 PyTorch Automatic differentiation for non-scalar variables; Reconstructing the Jacobian

Review

2022 NLP Papers at ICML2022 2021 Neural Tangent Kernel, Every Model trained by GD is a kernel machine (Review) 2020 NEURIPS 2020 2020 EMNLP 2020 2020 Some Clustering Papers at ICLR20 2019 Arithmetic(Book)

Machine Translation

2022 Minimum Bayes Risk Decoding

Reinforcement Learning

2024 Deriving the Basic Policy Gradient Update (REINFORCE) 2024 Temporal Difference Learning: Taking advantage of Incomplete Trajectories 2023 Dynamic Programming for Reinforcement Learning, the importance of the Bellman equations; (with Gymnasium)

Compression

2023 Training Sparse Neural Networks with L0 Regularisation

Generative Models

2024 Deriving the minimax equation for GANs 2024 A classical NLP researcher and a GPT-era Engineer meet at the coffee machine 2023 LLM Research and Adaptation Landscape

NLP

2024 Synthetic Question Generation for Retrieval Evaluation of RAG