-
Deriving the Basic Policy Gradient Update (REINFORCE)
-
Data Extraction for Unstructured Document Data
-
First 100 words
-
Temporal Difference Learning: Taking advantage of Incomplete Trajectories
-
Synthetic Question Generation for Retrieval Evaluation of RAG
-
Deriving the minimax equation for GANs
-
Chunking code for RAG; parsing-recursion-stack
-
A classical NLP researcher and a GPT-era Engineer meet at the coffee machine
-
Python Decorators for Monitoring GPU Usage
-
A baby and a PhD
-
LLM Research and Adaptation Landscape
-
Monitoring Jobs on the Server
-
Taking a break from NLP Research
-
Lean OmegaConf Argparse System
Discusses a YAML-based hierarchical configuration system called OmegaConf, which is useful for managing configurations across multiple sources. It explains the challenges of using argparse for nested configurations and introduces a custom file naming system for easy organization and retrieval of project-specific files. The post concludes by highlighting the benefits of this approach in terms of reducing project switch lag, improving file naming, and streamlining configuration management for various projects.Code -
Training Sparse Neural Networks with L0 Regularisation
Explores L0 norm regularization for training sparse neural networks, where weights are encouraged to be entirely 0. It discusses overcoming non-differentiability issues by using a soft form of counting and reparameterization tricks. The post also delves into concrete distributions and introduces a method to make the continuous distribution more suitable for regularization purposes.Machine Learning -
Dynamic Programming for Reinforcement Learning, the importance of the Bellman equations; (with Gymnasium)
Explains the Bellman optimality equation, which is crucial for finding the optimal policy in Markov Decision Processes (MDPs) using dynamic programming. It explores the concept of value functions, both for states and state-action pairs, and how they are essential in reinforcement learning. The post also discusses the Policy Iteration algorithm, breaking it down into policy evaluation and policy improvement phases. It provides insights into the centrality of Bellman equations in RL optimization and concludes with experiments on the Gymnasium Frozen Lake environment.Reinforcement Learning -
Could Large Language Models be conscious? (David Chalmers @ Neurips 2022)
The debate about consciousness in language models (LLMs) is complex, with arguments both for and against. David Chalmers defines consciousness as subjective experience and suggests breaking it down into dimensions like sensory, affective, cognitive, agentive, and self-consciousness. Arguments for consciousness in LLMs include self-reporting and conversational abilities, but these are debatable. Arguments against involve the absence of biology, senses, world-models, self-models, unified agency, and recurrent processing. The debate continues, with the likelihood of LLM consciousness emerging in the next 10-20 years is 50-50.Misc -
NLP Papers at ICML2022
Several papers were presented at ICML 2022, including topics such as co-training large language models with smaller models, using derivative-free optimization for language models, interpretable text modeling, generative cooperative networks for language generation, language model architecture for zero-shot generalization, coherent entity use in narrative generation, retrieval-augmented language models, and self-conditioning pre-trained language models. Some papers proposed new methods, while others explored existing techniques in various ways.Review -
You don't feel like you're good enough, but its not a competition
Success in academia shouldn't be about competition but gaining and sharing knowledge. External metrics like publications, citations, and retweets don't define your true academic worth. It's more about what you learn and contribute. Your unique journey in the pursuit of knowledge, no matter how long, will ultimately lead to discoveries meant for you to find and share with the world.Work Experiences -
Stochastic Gradient Langevin Dynamics
Stochastic Gradient Langevin Dynamics (SGLD) is a technique that combines stochastic gradient descent with Markov Chain Monte Carlo (MCMC) to efficiently explore high-dimensional parameter spaces, often used in Bayesian deep learning. It introduces random noise, approximates Langevin Dynamics, and eventually converges to the desired posterior distribution. The method involves discretizing Langevin equations, using stochastic gradients, and skipping accept-reject steps as the noise decreases. While it appears to add uncertainty to optimization, some argue it lacks a deep understanding of the posterior distribution. The convergence theory is still a subject of research, but it offers a convenient way to incorporate noise into optimization.Bayesian Inference Machine Learning Optimization -
NYCMidnight-100words
-
Recipe for connecting to Google Drive from Remote Server
-
Minimum Bayes Risk Decoding
Minimum Bayes Risk (MBR) Decoding is a decision-making approach used in fields like Automatic Speech Recognition and Machine Translation. It aims to select the best sequence or hypothesis from a set of possibilities by minimizing the expected loss over a probability distribution of sequences. MBR decoding is based on consensus and often involves approximations, making it an alternative to Beam Search, the default decoding method in sequence models. However, MBR Decoding does not necessarily minimize the actual Bayes Risk but is a form of consensus decoding. Its effectiveness depends on the nature of the learned probability distribution, and it may struggle with pathological or unusual sequences. Researchers have flexibility in applying various loss functions to the decoding process.Bayesian Inference -
Formalising Analogies for A.I
Analogies are comparisons between two things based on their similarities and are essential for human communication and reasoning. However, defining and using analogies in AI is challenging due to their complexity. Various formalisms, such as arithmetic, geometric, logical, algebraic, complexity, and functional views, have been proposed to represent and use analogies computationally. Each approach has its strengths and weaknesses, and there is ongoing research in this field.Machine Learning -
You're not doing well, but motivation is optional
-
Likelihood weighted Sequential Importance Sampling
Sequential Monte Carlo (SMC) methods are used to solve filtering problems in signal processing and Bayesian statistical inference. These methods involve approximating a probability distribution by drawing samples from it. Importance sampling is a key concept within SMC, where a proposal distribution is used to draw samples when the true distribution is unknown. The samples are then reweighted based on the difference between the true distribution and the proposal distribution. SMC can be applied to Hidden Markov Models (HMMs) and is particularly useful for tracking and estimating hidden states over time. The general process involves sampling new states from the proposal distribution and reweighting the samples based on how well they match the true distribution. Resampling may also be performed to enhance the method's efficiency.Machine Learning -
Neural Tangent Kernel, Every Model trained by GD is a kernel machine (Review)
-
Some QA from Deep Learning (CS 462/482)
This collection of questions and answers covers various topics related to deep learning, neural networks, and related concepts.Machine Learning -
NEURIPS 2020
This document provides an overview of several tutorials and discussions related to deep learning and neural networks. Topics covered include neurosymbolic AI research, equivariant networks, abstraction and reasoning, and practical uncertainty estimation in deep learning. Key points include the exploration of neural-symbolic approaches, the concept of equivariance in network architectures, the relationship between abstraction and generalization, and methods for improving uncertainty estimates and out-of-distribution robustness in deep learning. The document also touches on Bayesian neural networks, Gaussian processes, and the challenges of proper priors and model specifications.Review -
Adversarial NLP examples with Fast Gradient Sign Method
Explores the possibility of generating adversarial examples in natural language processing (NLP) using the Fast Gradient Sign Method (FGSM). Adversarial examples are inputs that change the model's output prediction while appearing benign. The FGSM is commonly used in computer vision to create adversarial images but can be adapted for NLP tasks.Misc -
EMNLP 2020
-
Variance of the Estimator in Machine Learning
-
Some Clustering Papers at ICLR20
-
A minimum keystroke (py)Debugger for Lazy ML/DS people who don't IDE
-
Recipe for building jq from source without admin(sudo) rights
-
The Sigmoid in Regression, Neural Network Activation and LSTM Gates
delves into the use of sigmoid functions in regression and neural networks, particularly focusing on logistic regression and its equivalence to a single neuron in a neural network. It highlights the historical significance of sigmoid functions in modeling continuous outcomes, as they exhibit a sigmoidal relationship, similar to many natural processes.Machine Learning -
Arithmetic(Book)
-
Clean TreeLSTMs implementation in PyTorch using NLTK treepositions and Easy-First Parsing
Tree LSTMs are an extension of traditional LSTMs designed for tree-structured network topologies. Unlike sequential models, which process words in temporal order, tree-structured models follow the given syntactic structure of a sentence, composing phrases based on this structure. The implementation of Tree LSTMs involves a parser to generate a parse tree, conversion of this tree into instructions for combining words, and the use of these instructions to progressively update RNN units.PyTorch -
Pad pack sequences for Pytorch batch processing with DataLoader
-
Modes of Convergence
-
Coordinate Ascent Mean-field Variational Inference (Univariate Gaussian Example)
-
Dirichlet Process Gaussian Mixture Models (Generation)
-
Gotchas in Cython; Handling numpy arrays in cython class
-
Onboarding for Practical Machine Learning Research
-
Equivalence of constrained and unconstrained form for Ridge Regression
-
Studying drug-drug interactions and predictors of adverse vascular outcomes
-
PyTorch Automatic differentiation for non-scalar variables; Reconstructing the Jacobian
-
From psychologist to CS PhD Student
-
Capturing Last-mile Transactions of Smallholder Palm Oil Farmers
-
Migrating from python 2.7 to python 3 (and maintaining compatibility)
-
Lagrange Multipliers and Constrained Optimization
The Lagrange Multiplier method is a technique for finding local minima and maxima of a differentiable function while subject to equality constraints on its independent variables. This method involves ensuring that the function and constraint are tangent to each other at the solution point. It can be applied to various optimization problems and has applications in machine learning, information theory, and more. While it can handle complex constraint equations and inequality constraints, it may not be suitable for large-scale problems.Calculus Optimization -
Taylor Series approximation, newton's method and optimization
-
Hessian, second order derivatives, convexity, and saddle points
-
Jacobian, Chain rule and backpropagation
-
Gradients, partial derivatives, directional derivatives, and gradient descent
-
Derivatives, differentiability and loss functions
-
Calculus for Machine Learning
-
Algorithms on Graphs: Fastest Route
-
Gibbs Sampling on Dirichlet Multinomial Naive Bayes (Text)
-
Markov Chain Monte-Carlo
-
EM Algorithm for Gaussian mixtures
-
Communicating Data Science
-
Cross disciplinary projects
-
Conjugate Priors
-
Closed form Bayesian Inference for Binomial distributions
-
DSO Advice
subscribe via RSS