complex_thumb.jpg

Why We Need More Complex Programming Languages (Yes, You Heard Me Right!)

My daughter is learning how to read right now.  As I was thinking about this blog post, I just walked past my wife and her working on some very basic reading skills.  It is quite a bit of work to teach her everything she needs to know to read and write the English language.

In fact, it will be years of hard work before she’ll actually be able to read and write with any measure of competence—at least by our adult standards.  We tend to take language for granted, but spoken and written languages are difficult—exceptionally difficult.

Even as an adult, writing this post is difficult.  The words don’t flow perfectly from my mind.  I strain to phrase things in the proper way and to use the proper punctuation.

But, even though it is difficult to learn a written language, we make sure our kids do it, because of the high value it gives them in life.  Without the skills to read and write a language, most children’s future would be rather bleak.

The more and more I thought about this idea, the more I realized how simple programming languages are compared to the complexity of an written or spoken language.

girl reading thumb Why We Need More Complex Programming Languages (Yes, You Heard Me Right!)

The argument for more complexity

The irony of me arguing for more complexity and not less doesn’t escape me, but even though I strive to make the complex simple, sometimes we do actually need to make things more complicated to achieve the best results possible.

I’ve thought about this for a long time and I believe this is the case with programming languages.  Let me explain.

Before I get into programming languages specifically, let’s start off by talking about human languages.

I speak and write English.  English is considered to be the language with the largest total vocabulary and also one of the most difficult languages to learn, because of the flexibility in the ways in which you can compose sentences with it.

It is very difficult to learn English.  I am fortunate that I am a native English speaker and grew up learning English, but for many non-native English speakers, the language continues to be a challenge—even years after they are “fluent” in the language.

There is a huge benefit though, to being fluent in the English language—expressiveness.  I don’t profess to be an expert in foreign languages—I only know a little bit of Spanish, Brazilian Portuguese and Japanese, myself—but, I do know that English is one of the most expressive languages in existence today.  If you want to say something in English, there is most likely a word for it.  If you want to convey a tone or feeling with the language—even a pace of dialog, like I just did now—you can do it in English.

As I said, I can’t speak for other languages.  But, having lived in Hawaii, I can tell you that Hawaiian is a very small language and it is difficult to express yourself in that language.  Sign language is another example of a very small language which is fairly easy to learn, but is limited in what it can convey and the way it can convey it.

I say all this to illustrate a simple point.  The larger the vocabulary of a language and the more grammatical rules, the more difficult it is to learn the language, but the greater power of expressiveness you have with that language.

Breaking things down even smaller

I promise I’ll get to programming languages in a little bit, but before I do, I want to talk about one more human language concept—alphabets or symbols.

The English alphabet has 26 letters in it.  These 26 letters represent most of the sounds we use to make up words.  26 letters is not a small number of characters, but it is not a large amount either.  It is a pretty easy task for most children to learn all the letters of the alphabet and the sounds they make.

The text you are reading right now is made up of these letters, but have you ever considered what would happen if we had more letters in the alphabet?  For example, suppose instead of 26 letters, there were 500 letters.  Suppose that we made actual symbols for “th”,”sh”,”oo” and so forth.  Suppose we made the word “the” into a symbol of its own.

alphabet thumb Why We Need More Complex Programming Languages (Yes, You Heard Me Right!)

If we added more letters to the alphabet, it would take you much longer to learn the alphabet, but once you learned it you could read and write much more efficiently.  (Although, I’d hate to see what the 500 letter keyboard would look like.)

My point is that we are trading some potential in the expressiveness we can pack into a limited number of symbols for some ease in learning a useful set of symbols.

As you were reading this, you might have thought that this is exactly what languages like Chinese and Japanese do—they use a large number of symbols instead of a small alphabet.  I don’t know enough about these languages to know the answer for sure, but I’d bet that it is much easier to read a Chinese or Japanese newspaper than it is to read an English one—or at least faster.

We could take the same exercise and apply it to the number system.  Instead of using base 10, or having 10 symbols in our number system, we could have 100 or even 1000.  It would take a long time to learn all our numbers, but we’d be able to perform mathematical operations much more efficiently.  (A smaller scale example of this would be memorizing your times tables up to 99 x 99.  Imagine what you could do with that power.)

What does all this have to do with programming languages?

You really are impatient aren’t you?  But, I suppose you are right.  I should be getting to my real point by now.

So, the reason why I brought up those two examples before talking about programming languages is because I wanted you to see that the vocabulary and grammar of a language greatly influence its expressiveness and the basic constructs of a written language, greatly influence its density; its ability to express things concisely.

Obviously, we can’t directly map human written languages to programming languages, but we can draw some pretty powerful parallels when thinking about language design.

I’ve often pondered the question of whether or not it is better to have a programming language that has many keywords or few keywords.  But, I realized today that was an over simplification of the issue.

Keywords alone don’t determine the expressiveness of a language.  I’d argue that the expressiveness of a language is determined by:

  • Number of keywords
  • Complexity of statements and constructs in the language
  • Size of the standard library

All of these things combined work together to make a language more expressive, but also more complicated.  If we crank up the dial on any one of these factors, we’ll be able to do more with the language with less code, but we’ll also increase the difficulty of learning the language and reading code written in that language.

Notice, I didn’t say in writing the language.  That is because—assuming you’ve mastered the language—the language actually becomes easier to write when it has more constructs.  If you’ve ever run across someone who is a master of Perl, you know this to be true.  I’ve seen some Perl masters that could write Perl faster than I thought possible, yet when they came back to their own code months later, even they couldn’t understand it.

Looking at some real examples

To make what I am saying a little more concrete, let’s look at a few examples.  I’ll start with C#, since it is a language I am very familiar with.  C# is a very expressive language.  It didn’t start out that way, but with all the keywords that have been added to the language and the massive size of the base class libraries, C# has become very, very large.

C# is an evolving language.  But, right now it has about 79 keywords.  (Feel free to correct me if I am wrong here.)  As far as languages go, this is pretty large.  In addition to just keywords, C# has some complex statements.  Lambda expressions and LINQ expressions immediately come to mind.  For someone learning C#, the task can be rather difficult, but the reward is that they can be pretty productive and write some fairly concise code.  (At least compared to a more verbose language like C or C++.)  Java, is pretty close in most of those regards as well.

But, take a language like Go.  Go is a language with only 25 keywords.  It makes up for this by having some fairly complex language constructs and having a pretty robust standard library.   When I first learned Go, it took me perhaps a week to feel like I had a pretty good grasp of the language.  But, it took much longer to learn how to use Go properly.  (And I still have plenty to learn.)

At the far end of the spectrum, we have languages like BASIC.  Different BASIC implementations have different keyword counts, but most of them are pretty low and the constructs of the language are very simple.  BASIC is a very easy language to learn.  But, because it is so easy to learn BASIC and BASIC is so simple, the average programmer quickly outgrows the capabilities of the language.  BASIC isn’t very expressive and it takes many more lines of code to write the same thing you could write in a few lines of C# or Go.

For a much more comprehensive overview of differences between programming languages, I’d recommend Programming Language Pragmatics.  It does into details about many different languages and the differences between them.

What more complex programming languages buy us

It feels really weird to be arguing for something to be more complex, since the mission of this blog is to make the complex simple, but in the case of programming languages, I think the tradeoff of increased complexity is worth the cost of extra learning time.

Consider how much more complicated the English language is than any programming language.  To be able to read the very words you are reading now, you have to understand a vocabulary of several thousand words, recognize most of those words on sight, and understand a very complicated set of mechanics which govern the grammar of the language.  There aren’t even concrete rules, much of what is “right” or “wrong” is based on context.

Yet, even with all this complexity, you are able to do it—our brains are amazing.

complex thumb Why We Need More Complex Programming Languages (Yes, You Heard Me Right!)

Now, imagine what would happen if we decided that English was too difficult of a language and that we needed to dumb it down.  What if we dropped the vocabulary down to say 200 words and we got rid of the complex rules.  What you would have is basically a Dr. Seuss book or some other early reader type of children’s book.  It would be very difficult for me to convey the kinds of thoughts I am conveying to you right now with those restrictions.

When you compare even the most complex programming language to the English language, it is no contest.  The English language is far more complex than any programming language we have ever conceived of.  I don’t know of a programming language that the average person couldn’t learn reasonably well in a year’s worth of time.  But, if you were to try and teach someone written English in a year—well, good luck to you.

If we created much more complex programming languages, we would have a much larger learning curve.  But, in exchange, we’d have a language—that once mastered—would allow us to express algorithmic intent at a level we can’t even imagine now.

Not only would we be able to express our intent more clearly and more concisely, but we’d also greatly reduce the total lines of code and potential for bugs in our software.  Less code equals less bugs.

The drawbacks

Now, I’m just playing around mentally here.  I “half” believe what I am saying, because I am just exploring ideas and possibilities.  But, even in this mental exercise of thinking about what would happen if we created programming languages as complex as written languages, I can’t ignore the drawbacks.

Obviously, the biggest drawback would be the learning curve required to learn how to program.  Learning how to program—at least how to do it well—is pretty difficult now.  I still think people make it more complicated than it needs to be, but software development is a much more difficult vocation to pick up than many other career choices.

If we created more complex programming languages, we’d have to count on many more years of learning before someone could even really write code or understand the code that is already written.  It might take 4 or 5 years just to understand and memorize enough of the language to be able to use it effectively.

We could of course combat this to some degree by starting beginners on easier languages and advancing them up the chain to more complex ones. (In fact, writing this article has convinced me that would be the best way to learn today.  We shouldn’t be starting developers with C# or Java, but instead should teach them very simple languages.)

We would probably also be forced down a smaller path of innovation, as far as programming languages go.  The world can support 100’s of simple programming languages, but it can’t easily support that many complex languages.  We might end up with one universal language that all programmers used.  A language of this size would be very unwieldy and hard to advance or change.  It would also take a massive effort to create it in the first place, since written languages developed naturally over hundreds of years.

That’s enough fun for now

After writing this article my brain is hurting.  I’ve been considering writing this post for awhile, but I wasn’t sure exactly where I stand on the issue.  To be completely honest with you, I still don’t.  I do think that more complex programming languages would offer us certain benefits that current programming languages do not, but I’m not sure if the drawbacks would be worth it in the end or even what a significantly more complex programming language would look like.

What about you, what do you think?  Am I just crazy?  Is there something significant I missed here?

Oh, and if you found this post interesting and want to hear more of my crazy thoughts about software development—and a few sane ones as well, sign up here and you’ll get a weekly roundup of my posts and some other content I only send out to subscribers.

  • pizzapanther

    Interesting thoughts but not convinced. A big basis of this logic is that English is the most expressive. But is English really the most expressive? With Chinese there are more characters to learn so in order to increase your expressiveness you have to learn more characters. In English you have learn a smaller set of characters that you can combine to form words. Smaller words can often make up bigger words. This means I can sometimes see a big new word and often know what it means by context and how it is made. I’m guessing this is harder to do in Chinese.

    From wikipedia it looks like Chinese can have up to 40,000 characters and to read an average newspaper you need to know 3000 characters. So my guess would be that Chinese is just as expressive as English it is just harder to get there. I would theorize that if you measure expressiveness in English speaking people versus Chinese, English would have a higher rate of expressiveness.

    So to summarize, Chinese has a flat approach that scales directly with how much you know, while English has a more layered approach which scales faster with how much you know.

    So it is not necessarily the complexity of the programming language but its architecture. Does the language have constructs that build on each other or is it more of a flat architecture?

    That is my two cents, it may be utter crap but that’s what I’m thinking.

    • jsonmez

      You make a great comparison. I think you are on to something here. In some ways you prove my point, but in otherwise disprove it.
      What I mean is that I agree that Chinese is probably more expressive. And it does it in a more compact format, so it is actually faster to read and write (theoretically from someone who doesn’t know Chinese.) By having more keywords and “options,” we can build languages that are harder to learn but are expressive and concise when written. By having less keywords, we can build languages that are easier to learn, and we can make them more expressive by adding layers of meaning.

  • cwbrandsma

    There is ample room for both simple and complex languages. For example: assembly is a pretty simple language, but at the same time, complex enough to create every other language out there. If I’m using a shell language (like Bash), I don’t want a very complex language either. In fact, once you get to the size of C++, things can get disastrous, as no one really understands the entire language!

    But, as with most languages, the complexity isn’t in the expressiveness of the language, but in the features of the libraries you can use with that language. That sets the stage for what problems you try to solve with it, and how complex the problems are. That is really why the size of C++ isn’t really the issue, but learning the ins-and-outs of STL is rather daunting.

    If you really want to get into an interesting topic, start playing with language symmetry. That is how consistently does the language feel like itself, and not a mashup-up of language features. I think C# and Java do pretty good here, but C++ and Objective C fall flat (that is actually why the languages are difficult, it feels like you are context switching in the language — because you are).

    • jsonmez

      That is a really good point. But I still wonder if we have one “all purpose” language that was expressive enough and it was widely known enough, wouldn’t it work for most cases?

      You are right about the library features being a key point though. For some languages it is pretty difficult to separate the standard libraries from the language itself.

      Language symmetry isn’t really something I haven’t given much thought to, but it makes sense.

  • jsonmez

    Thanks Brian. I have to agree with you about Dart. I do enjoy Dart quite a bit. I am also a big fan of Go, but I haven’t actually used Python that much.

    Perl is actually one of the languages that got me thinking about this topic. I have somewhat of the same viewpoint as you do on Perl. I actually got into loud arguments expressing this viewpoint early in my career. But, I was starting to wonder… what if we all were Perl masters? Would be be also extremely efficient?

    • BrianS

      The problem with Perl (and perhaps your overall idea) is the large decision tree you have to go through to figure out meaning. There are so many ways something could be expressed, you spend a lot of effort trying to figure out how people are trying to express themselves. If multiple people are involved (as in a large project) this problem is compounded. This is fine for poetry, but not desirable when precise concepts are manipulated in bulk.

      In academia and other specialized circles, people are often said to speak in their own language. One of the reasons this is true is so that they can more quickly and easily talk about a specific set of problems. It saves time and by and large the specialists are speaking the same language. The problem with Perl is that it allows everyone to speak in their own language. You need some overall structure if you want to understand _quickly_ what the other person is saying. And in a computer program of any significant size, you will need that speed of understanding.

      Perl worked great when it was 20 or even 200 line scripts. When you got to something the size of Majordomo, it was just a mess.

      • jsonmez

        Yes, that is true. I do wonder though if we knew Perl as well as we know English, wouldn’t we be able to comprehend it just as easily? You are probably right though, since I haven’t seen even the most skilled Perl developer be able to easily read someone else’s Perl code.

  • shaurz

    It takes Japanese and Chinese people much longer to get a full grasp on their language, and their newspapers are much harder to read. I disagree with the premise that language complexity buys you anything. Alphabets are simply a superior system to logograms and syllabaries. Fluent alphabetic readers read whole words anyway, not individual letters. But it is impossible to read the individual letters of a logogram because the symbol is unique and must be learned by rote. There is a complexity sweet-spot.

    With language complexity you get kitchen sink languages like C#, C++ and PL/I. As language complexity increases, the number of interactions between language features goes up exponentially. It is not possible for the human mind to grasp this in its entirety. Instead of using mental resources to deal with the program, they are wasted on dealing with language issues.

    I agree that some extra features allows some greater level expression – to a point. For example, parametric polymorphism allows more complex type relationships to be expressed that would otherwise require ugly casting. But there is always a trade-off.

    • jsonmez

      Good point about reading whole words at a time instead of seeing the letters. It seems there is a point of diminishing returns for expressiveness vs complexity.

    • Silwing

      Saying that Japanese and Chinese symbols are unique and must be learned by rote is simply wrong. Take a look at the books written by James W. Heisig and you will see why.
      You are right however about reading whole words, but that requires knowledge of the word just as well as reading a Japanese character requires knowledge of the character. It isn’t that much different. We take our alphabet for granted because we already know it and learning Japanese is hard for many people because they do not know the “alphabet” beforehand. Chinese people have a headstart learning Japanese even though Japanese and Chinese languages are about as like as Danish and Spanish. Because they know most of the “alphabet” already.

  • Mike Cattle

    When the abstract language hits the ground, though, it has to exist in a world with managers who think that interns are capable of writing production code for a fraction of the salary. Yes, anyone can write code that works, but those who treat programming like a craft will write non-fragile code that works and can easily change to fit new business requirements in the future. When it comes to languages, a good language is one that lets developers (even interns) “fall into a pit of success”. That is, good code is written by default. I don’t see how adding to the complexity of a language would help achieve this goal. (And, as others have pointed out, programming languages are literal, and don’t require tones, subtle nuances, or inflections, the way spoken languages do.)

  • Chris C.

    I believe that an ideal mix was what I experienced with Visual Basic versions 2 through about 5.
    The language was simple and easy to learn, but for those advanced things I needed (like in VB2 a quick search through a listbox containing hundreds of items) I could “drop down” and use the Windows API.
    There a simple/easy opening for those that wanted to come be with us, but there was still sort of a “cheat” for those of us that knew how to use it.
    With an OCX you could even get at the Windows event loop (and bless Dan Appleman for those Win API books!)
    Yes, it was the age of the non-programmer where anyone could do 80% of a project quickly and never be able to finish the last 10% because they simply didn’t have enough knowledge. (Think client server projects with an MS Access backend that worked great for one user but never scaled to the dozen simultaneous users as promised – it only worked well with one user because the programmer didn’t really know enough).
    I find C# to be reasonable in complexity and expressiveness, that’s just my $0.02

    HTH,
    -Chris C.

  • jsonmez

    I saw that, but I hadn’t checked it out yet. Thanks for reminding me.

  • jsonmez

    Wow, great find. I am shocked by the results. I thought Chinese would be faster for sure.

  • http://bit-builder.blogspot.com/ Justin Hewlett

    Interesting thoughts, John.

    A complex language might be really good for a solo developer (like the example of the perl expert), but as soon as a team is involved, the code needs to be understood and maintained by everyone without expending too many cycles on things like differing styles. Think of C++ — it has so many features that many teams have “style guides” that define which small subset of the language they will allow. Otherwise there is too much inconsistency and maintenance becomes much harder.

    I think really good languages provide a simple set of building blocks that are consistent, intuitive, and allow you to build higher levels of abstraction as needed. Think of Ruby. The language itself is fairly simple, but due to the flexibility of the syntax teams are able to create internal DSLs that better express their intent.

    Of course, it also helps when the language has a lot of the very common idioms built right into the language/standard library. Ruby also does a good job with this, with a strong collections API that is in many ways similar to linq.

    Javascript is another language that comes to mind as one that, despite its obvious flaws, is quite simple yet flexible and powerful.

  • sdjam33

    John; will this new complex programming language have a
    bunch of idioms? Oh wait, I may have seen that in some coding around here
    already J

    After reading this and some comments I love and hate the
    idea of a supper complex programming language at the same time. I believe it
    would take a long time to fully master most programming frameworks. That being
    said you do not have to fully understand a framework to create a conducive application.
    Like English; I seem to be looking words
    up in the dictionary (Google) a lot.

    Thanks for your posts; I enjoy reading them as they help me
    to think.