Things I Don't Know

I was watching some of Richard Schneeman’s Rails screencasts, and he mentioned that one thing he wished he’d done when he first learned to program was keep a blog and include, from time to time, the things he didn’t yet know/know how to do. So, here it is: the things I don’t know.

Algorithms & Data Structures

I knew from the start that this was an area I’d be weak in—it’s generally not something you learn “on the job” and is extensively covered in undergraduate computer science classes, of which I only took one—so I’ve made a conscious effort to learn as much as possible in this area. I picked up Kyle Loudon’s Mastering Algorithms with C and George Heineman’s Algorithms in a Nutshell, and while I feel like I’m really starting to internalize the concepts and form the right mental models, I’ve still got a long way to go. (I’ll pick up the canonical CLRS as soon as I can find it for under $80.)

I’ve learned a bunch of sorting algorithms and know their big-O time complexities, but I’ve never sat down and implemented a search algorithm from scratch. I know what linked lists, (de)queues, stacks, B-trees, binary trees, tries, hash tables (chained and open-addressed), and graphs are, but I’ve only implemented a few of these in C. (I think more C practice in general will help with a lot of the topics covered in this post; see below.)

Design & Architecture

I picked up the GoF book and am working through it, and while I understand the thought and history behind the patterns, they’re not yet obvious to me when I read code “in the wild.” (That is, I probably wouldn’t immediately say, “Oh, this is the Decorator pattern” after reading someone else’s program.) I think this will come with practice, and one of my immediate goals is to replace the janky global variable in Ruben with a singleton.

In terms of general architecture, I think this will improve with continued reading and writing. Ten years ago, I had no idea how to structure a poem or book of poems; I learned by reading tens of thousands of poems and hundreds of books of poetry, as well as by writing my own. Similarly, I expect to have to read thousands of programs across hundreds of projects (while continually writing my own) before I feel like the process is intuitive. The process will also become faster, since I’ll know (through having read others’ mistakes and having made my own) what works well and what doesn’t; as the saying goes, “In the beginner’s mind there are many possibilities, in the expert’s mind there are few.”

Distributed Programming & the Internet

One of the weaknesses I’m working really hard to shore up is my general understanding of how the Internet works, how technologies like TCP/IP and WebSockets work, and techniques/best practices for programming distributed systems. It’s a huge field to tackle, and while I have a general understanding of everything from REST to DNS lookup to how the browser renders a page, I know I’m missing a lot of the underlying details. When would you want to use WebSockets and when would you want to use XHR? What are the individual steps in domain name resolution? What are the most important things I need to know to write more efficient CSS or JavaScript? (I actually sort of know the answer to the WebSockets question, but these are the kinds of things I’ve asked friends and coworkers over the past few months.) The deep details of the web and the browser are not only really interesting to me, but necessary for me to understand as I work to become a better web programmer.

Databases

I understand the ideas behind databases and RDMSs and have written a bit of SQL, but I don’t understand the actual relational calculus. (I’m not sure this is truly necessary, but if it will deepen my understanding of SQL and relational databases, I feel like it can’t hurt.) There are some concepts, such as queries and JOINs, that I’ve reinforced by writing code and working with a database, and others (like database normalization/denormalization) that I’ve never done before. A good example of something I figured out through experience is that until last week, I didn’t understand that primary keys increment monotonically—that is, IDs for deleted rows aren’t reused. (You might notice that the previous post had an ID of 6, but this one has an ID of 11. Evidence of learning!)

I’ve done a bit of work with MongoDB and I understand the broad differences between relational databases like PostgreSQL and document-oriented NoSQL databases like Mongo, but again, I’m somewhat lacking on the details. I do understand that Mongo’s storage of entire documents obviates computationally expensive JOINs and makes reads much faster than writes, and I understand how NoSQL solutions like Mongo scale much better horizontally than a traditional relational database, but if you plunked me down at the command line and asked me to shard the database or deploy a replica set, I wouldn’t know where to begin. Some of this is particular to Mongo, though, and I feel like practicing more with that technology will help close some of the practical knowledge gaps.

Systems Programming

This is another area where I feel like I understand the basics, but lack deep knowledge of the details due to never having done a lot of this kind of work before. I understand generally how compilers, linkers, and interpreters work, but would be pretty lost if asked to write my own compiler from scratch. Thanks to learning C, I do know some assembly and have a working knowledge of stack & heap, CPU registers, and memory addressing, but I feel like that knowledge could be improved. I understand the concepts of virtual memory and paging, but in a very general way. I’m not sure if the Internet or a solid book on memory/machine architecture would be more useful, and while I understand this might not be necessary for me to know if I’m sitting around writing Ruby and JavaScript all day, I still feel like it’d be a pretty serious failing for me not to know it better.

Speaking of the machine—I want to learn way more UNIX stuff. I know what the kernel is and what it does, as well as the general concerns of the operating system (device, file system, memory, and process management), but I don’t know how, say, the scheduler actually works. I’ve never written a shell script of more than a dozen lines or written anything interesting, like a cron job or bootstrap script, so I want to dig into that more. I think Kernighan and Pike’s The UNIX Programming Environment and Cameron Newham’s Learning the bash Shell would be great for this. I’m especially excited to pick up Kernighan & Pike, since I really enjoyed the straightforward (and often witty) K&R.

Lastly, threads and concurrency are still sort of a mystery to me. This is yet another arena where I understand what threads and processes are and what concurrency is, but having never written a program in which I had to manage threads and processes manually, I don’t think I fully understand them. I’m sure I will after the first time I spend hours trying to debug a program with subtle race conditions, but until then, I’m going to seek out projects designed to help me work with threads/concurrency and better understand the topic.

Programming Languages

In descending order of completeness, I know Ruby, JavaScript, C, and Python. After talking to a bunch of friends and coworkers, I’ve decided I really want to master Ruby as much as possible, then go back and really master JavaScript and C. So far I’ve tried to progress through all three more or less simultaneously, but I’ve been advised (and now agree) that it’s better to become truly fluent in one language, then pick up others, rather than try to attain fluency in three languages simultaenously.

I’ve also dabbled in Clojure and Haskell, but I still don’t feel like I really grok functional programming. I’ve made some headway by thinking about and trying to write JavaScript in a more functional way, but I still reflexively reach for state and objects rather than composed function and tail recursion. I think this is just a question of assimilating a totally new mental model, which will definitely take time, and it’s something I want to invest in after becoming a better Ruby and C programmer. (I think really mastering JavaScript will be a nice segue into working with more purely functional languages, and I look forward to learning that mode of thinking through the lens of a language I already know well.)

As mentioned, I want to learn shell scripting as part of my UNIX/systems programming development. From what I’ve seen, this shouldn’t be too difficult after mastering a scripting language like Ruby.

Tools

Finally, there are two everyday tools I want to get much better at using: Vim and Git. Both are a matter of daily practice and reading the documentation/doing tutorials for each; practice will instill the necessary muscle memory, and better learning the commands will give me a much wider range of options when trying to accomplish a particular task. In terms of books, I’ve started reading Drew Neil’s Practical Vim and I’m planning to pick up a copy of Scott Chacon’s Pro Git.

I didn’t realize how long this post would be until I wrote it, which is somewhat humbling but simultaneously really galvanizing—now that everything I currently feel like I need to learn is in one place, I can iterate on this list and backfill my knowledge in a systematic way. Of course, this isn’t everything I want or need to know, and I’m sure that (hydra-like) each thing I learn will be replaced by two more things I want to learn. But that’s the whole point, right? One of the major perks of software engineering is that there’s always something new to pick up and learn, and I can’t imagine not being excited about that.