2019-12-23: I was probably in a state of euphoria when this was written in 2018, and while I’m ashamed for overly pompous sounding bits but I’ve decided to leave this unedited.

Last December I applied to and was accepted by John Hopkins University to do a PhD in Computer Science, specialising in Natural Language Processing.

2 years ago, my career trajectory looked incredibly different. My job as a psychologist didn’t require a single line of code. This post serves as a reminder to not fear the effort required to change career trajectory, and also to review the relative effectiveness of the various steps taken on hindsight.

1. Changing job scope

The first thing I did was to change into a division that would require me to spend more time programming, so that personal interests and work were aligned. I was fortunate to be given the opportunity to join the Machine Learning and NLP group. I wouldn’t say that I was super into Machine Learning or data science, but I was most captivated by the NLP dream. At that time I understood the promise of NLP to be building technologies which could harness the great wisdom of human knowledge residing in written or spoken form. It sounded absolutely magical.

Nearly all of my technical projects centered around Machine Learning and NLP. There was much to do in this area, after 3 years I feel like I’ve barely touched the tip of the iceberg. I touched on Stance Analysis, Information Retrieval, Anomalous Text identification, Collaborative Filtering and enjoyed learning about these topics immensely.

Its difficult to try and operate at a researcher level, when one doesnt even have the University pre-requisites to the pre-requisites. Gaps present themselves everywhere, all the time. Programming ability is a gap, IDE fluency is a gap, Math is a gap, Kullback Liebler is a gap, git is a gap. Before you manage to catch up on one, another appears within the same sentence. Tools, Foundational Knowledge, Applied Knowledge, Programming ability, everything hits you at the same time.

I was clueless most of the time, and (I must have read about eigenvectors and eigenvalues, probabilistic graphical models more than 3 times and I still can’t confidently tell you what they are). But eventually some of these things started to stick, after the 5th time of reading them or so.

Overall Effectiveness: 8/10

  • Short-term: 8/10
  • Long-term: 8/10

Being paid to learn, it doesn’t get much better than this!

2. Part-time Masters in Knowledge Engineering

I started looking into part-time Masters programs in the first 3 months of my first job to try and empower myself with more skills to do my job better. The Msc gave good breadth into the technology space and gave me the confidence to start non-trivial conversations about commerial technologies like cloud computing, continuous integration, agile development, data warehousing etc.

This gave me awareness about technologies and served as a foundation for increasing knowledge in this area even if I did not necessarily practice it at work.

The best time of my Msc was theoretical foundations in Multimedia offered by School of Computing which I got a C for. I struggled a lot but learnt important lessons from the Professor. The one that struck me most was that anyone can get state-of-art performance by downloading the latest algorithm from github and running the model. But how many people are able to study the problem well enough, understand the characteristics of the problem and come up with an algorithm that addresses the characteristics of the problem?

Overall Effectiveness: 5.5/10

  • Short-term: 3/10
  • Long-term: 8/10

The System Science courses didn’t feel terribly useful for leveling up as a developer or in algorithms, as it was more of a CTO level course where we explored many technologies but didn’t directly implement them. However it offered a rare networking opportunity with people who were indeed at the CTO level ;)

For the pure CS courses from School of Computing, these were hard and I struggled without the foundations. Hurts the GPA for applications, but always a humbling learning experience which I appreciate that.

3. Rejections

The year before, I was rejected for DSO PhD Scholarship, and rejected for all PhD programmes that I had applied for.

Overall Effectiveness: 6/10

  • Short-term: 9/10
  • Long-term: 4/10

Reality. Rejections are the symptom of reaching for something beyond your current worth. The unpleasantness following rejection triggers useful thoughts on what is lacking, what needs to be done to reach the goal.

Even so, there are smarter ways to get at reality other than rejections. I think the advice of “Just try. You never know until you do” is very outdated in the internet age. Actually, one could know, if one had a way to get a good grasp of reality without necessarily going through the effort of ‘trying’. Skip the rejection, go straight to the reflection.

4. Competitions

Competitions have been mighty effective for me, especially those with leaderboards that effectively gamify the learning experience. My first competition was Semeval in 2016 less than half a year after changing jobs. I did not get far enough to make a competition submission but I submitted a short paper to ACL which was narrowly rejected (3-3-2). (On a side note, I guess this counts as an example of how conference acceptance is pretty much a lottery seeing how close I was to acceptance. I would have given the paper a much wider margin of rejection.)

Thus my first competition experience led to my first attempt at writing at paper. The greatest lessons came in the form of comments and inputs from my Principal researcher impressed upon me the rigor required in preparing a submission for a technical paper. I would later benefit from these lessons when writing the paper for CIKM 1.5 years later.

Competitions highlighted that simple familiarity was a major bottleneck in how well one could do. I’m going to assume that semi-serious participants would have read up on strategies of how to move up rankings. Given that the competition period is usually short, it can be reasoned that the winner should have a combination of knowing what exists, what to try from open literature, how to iterate (what to try next), and having sufficient competency to tweak someone else’s solution slightly.

I took part in two of the CIKM Analyticup competitions in November 2017. The first one Lazada was on predicting whether item titles were clear and concise. This was a ‘kaggle type’ competition. By that I mean that the difference between 1st and 10th were only a few decimal places in accuracy away from each other. Despite feeling confident of being quite decent in a Text Classification task, I finished only slightly above average in rankings and spent time trying alot of stuff that didnt really help at all, reflecting my inexperience.

The show-and-tell nature of hackathons also exposed me to web programming and open-source visualisation tools. See here for the most recent exploit. I am happy to develop some basic skills as its hard to run from show-and-tell.

Overall Effectiveness: 6/10

  • Short-term: 6/10
  • Long-term: 6/10

After a while, the art of winning competitions start to become formulaic. Most of the things done here is built to throw away, or rehashing of what you already know since there is very limited time to get stuff up and nothing matters other than the final number of the final set of presentation slides. Pitching an idea, presentation skills and work allocation are useful skills in the long run.

5. Side projects

This section was inspired by three key conversations.

The first one was with the organisation’s HR Director who had just announced that she was leaving. She shared many things, in particular, that she would encourage her staff to aim to have something to update their CV every year. The second conversation was with a manager, who told me that I need to specialise to differentiate myself. The last is a conversation with myself. I was keenly interested in NLP and side projects just seemed like a great way to keep myself constantly learning and experimenting with new technologies.

Chatbots were a very natural option for side projects. I had a strong affinity for this area given my previous psychology and HCI background. Also for dialogue systems, the applications are endless because essentially we are automating a traditionally human conversation-based service. Dialogue is also one of the most challenging areas in NLP, and would allow me to explore a host of techniques and subtasks in building chatbots.

Overall Effectiveness: 7/10

  • Short-term: 7/10
  • Long-term: 7/10

Side projects did not do much for algo skills, but did alot for software skills. Keeping things clean for dev vs deployment, unit-testing, version control etc. This allowed me to implement ideas at work faster and cleaner, which translates into more effective research practices.

6. Courses and Machine Learning Summer School

Around 1 year after transferring to the new division, I became increasingly disastisfied with my inability to formalise elegant solutions to problems. I was decent at ‘hacking’ stuff up by stringing technologies from various places into a ML pipeline system, but I never felt impressed by any of my own research ideas. I associated this with lack of understanding from first principles, hence I signed up for the Machine Learning Summer School in Max Planck Institute of Intelligent Systems, Tubingen. The company supported me for the course fee and time-off for training, but the rest of the trip including airtickets and living costs was fully self-financed.

It was a bit of an awakening. The amount of assumed knowledge, the detail in which some lecturers covered the material, and the introduction of so many fields of Machine Learning that I hadn’t known existed (submodularity) were crammed in the span of 2 weeks. I had reserved an additional day for travelling but spent it trying to understand Bayesian non-parametrics. I had gotten much more than what I had signed up for.

Overall Effectiveness: 6/10

  • Short-term: 4/10
  • Long-term: 8/10

Good for confidence and genuine learning, but not particularly useful for applications. Although I think potential supervisors will appreciate the fact that you took a course on your own, it has to be backed up by other things on your CV. Foundations are a long-term investment, not so easy to see the fruits of this effort in the short-run.

In fact, I focused on self-taught courses much more AFTER getting admitted into the PhD programme. In the 9 months leading up to the start of the programme, I focused on Deep Learning, Data Structure and Algorithms, and Math for ML.

7. Taking students

This was not something I planned for to get ahead. My personal policy is to take on as many students as my time can afford. As much as possible, I wanted to give students the opportunity to learn and grow that was afforded to me by the previous generation. My first supervisor once told me, if I was truly appreciative of his guidance, the best thing I can do is to pay it forward.

Overall Effectiveness: Non-quantitative

8. Acknowledgements

I would like to dedicate this post to my academic referees, and work place supervisors. There are many people to which I owe this opportunity, but in particular, my mentor in DSO, Chieu Hai Leong. If there was anything clever/impressive I did during my DSO days, the idea probably came from him. Also to Jet, my life partner and collaborator on many of these non work related projects.