Domain Specific Languages

Domain Specific Languages (DSLs) are specialized languages carefully crafted to serve one single goal.

Why should you be interested in them?

Because they can completely change how an organization works. In particular, they can dramatically increase productivity and change how developers and domain experts communicate.

We are going to answer three questions:

What are Domain specific languages? We will look at some concrete examples
What are the benefits you can achieve by using them?
How can you create DSLs?

What Are Domain Specific Languages?

You are probably familiar with General Purpose Languages, like Java or Python, which are able to express every possible algorithm.

Domain Specific Languages are different: their strength lies in doing only one thing, but doing it well. They are specialized languages that can be built to be used inside one single company—something I routinely do for my clients. However, some DSLs become widely used, and you probably already know some of them.

Examples of Domain Specific Languages

Domain Specific Languages can serve all sorts of purposes. They can be used in different contexts and by different kinds of users. Some DSLs are intended to be used by programmers and therefore are more technical, while others are intended to be used by someone who is not a programmer, and therefore they use less geeky concepts and syntax.

DOT – A DSL to define graphs

DOT is a language that can describe graphs, either directed or non-directed.

From this description images representing these graphs can be generated. First, you use a program named Graphviz, which works with the DOT language. From the above example, you would get this:

An image generated using the DOT DSL

The language permits you also to define the shape of the nodes, their colors, and many other characteristics. But the basics are pretty simple and almost everyone can learn how to use it in a matter of minutes.

Gherkin – A DSL to define functional tests

Gherkin is a DSL for defining functional tests. It has a very flexible syntax that makes it look almost like free text. Basically developers, analysts, and clients can sit around a table and define some scenarios. These scenarios will then be executable as tests to verify whether the application meets the expectations. For instance, here is how we could define the expectations for withdrawing from an ATM:

Scenario: Verify that withdrawing at the ATM works correctly

I really like this DSL because the requirements for using it is very low. This DSL, however, requires a developer to define some code using a GPL. How it works in practice is that a developer defines specific commands like: “{name} has {amount}$ in his account” and defines the code that executes this command in the GPL chosen for the project (Ruby, Java, and others are supported).

Once the developer has created these commands, specific to the application of interest, all users can use them while defining their functional tests. It is also possible to start at the other end: first, you write your scenarios, as you want, trying to capture the requirements, and only later developers map each command to a corresponding function in a GPL.

In other words, this DSL is great for hiding the real code behind a surface that everyone can understand and everyone can contribute to. It is much better to sit at a table and discuss the example we have displayed with a bank representative than it is to show him the hundreds of lines of Java which correspond to those commands, right?

SQL – Databases

You have probably heard of SQL. It is a language used to define how to insert, modify, or extract data from a relational database. Let's get some stats from the STATS table:

Certainly, you would not expect the average Joe to be able to write complex queries: SQL is not a trivial language, and it requires some time to be mastered. However, you do not need to be trained as a developer to learn SQL. Indeed, many DBAs are not developers.

Maybe Joe should not be trusted with writing access to the database, but he could get read access and write simple queries to answer his own questions instead of having to ask someone and wait to get an answer. Suppose he needs to know the maximum temperature during August in Atlanta:

Maybe Joe will never reach the level of a DBA, but he can learn a few basic queries and adapt them to his needs, making him more independent and letting his colleagues focus on their job instead of helping him out.

HTML – Web layout

I really hope you have heard of this quite successful language to define documents. It is amazing to think that we could have defined HTML pages 20 years ago when most people had desktop computers attached to monitors with a resolution of 640×480 pixels, and now those same pages can be rendered on the browser running on our smartphones. I guess it is a good example of what can be achieved with DSLs.

Note that HTML is really about defining documents: their structure and the information they contain. The same document is then rendered differently on a desktop computer, a tablet, or a smartphone. The same document can also be consumed differently by people with disabilities. For instance, specific browsers for people with impaired sight can help them consume a document defined with HTML. Such browsers read the content and support navigation to the different sections of the document.

CSS – Style

The Cascading Style Sheet language defines the style used to visualize a document. We can use it to define how an HTML document will appear on the screen or how it will appear when printed.

CSS is not trivial to master, but many people with basic or no knowledge of programming can use it to change the appearance of a web page. This DSL has played an important role in democratizing web design.

ANTLR – Lexer and parser definitions

ANTLR comes with its own DSL to define lexer and parser grammars. Those are instructions for recognizing the structure of a piece of text. For example, this is a small snippet of a lexer grammar:

JavaCC, Lex, Yacc, and Bison are similar tools and all come with their slightly different DSL, inspired by the Backus-Naur form.

Make – Build system

Make is a language to describe how to build something and the dependencies between different steps. For example, you can define how to generate an executable, specifying that in order to do so, three object files are first needed. Then you can define for each of those object files how to obtain it from a corresponding source file.

In this example, we specify that to create the program myExecutable we will need the object files, and once we have them, we will use gcc to link them together. We can also define some constants at the top of the file, so it is easy to change the Makefile later, if we need it.

So what can we use DSLs for?

After looking at these examples, we can conclude that DSLs can be used for a variety of goals:

Defining commands to be executed
Describing documents or some of their specific aspects
Defining rules or processes

These are just some typical usages, but DSLs can be used for so many other reasons.

At this point, you should have some understanding of what a DSL can look like and can be used for. Now, let's see why you should use one, and then look at how to build one.

Why Should I Use a DSL Instead of My Favorite Programming Language?

If you are a developer, it is tempting to use your hard-earned proficiency with C# or Ruby to solve any possible problem—I get that.

However, there are also compelling reasons to solve some specific problems with a more specific, appropriate tool.

Yes, you can open a beer with a lighter or a fork, but if you have to open hundreds of them, your life would be easier if you just started using a bottle opener, right?

The same thing applies to programming: some problems are better solved with a specific language, for the reasons we are about to explore.

You should also consider another aspect: not everyone is a programmer like you. Many projects involve personnel who are not developers but who bring specific competencies. They are the domain experts.

Maybe you write accounting software, and you work with accountants and business consultants. In this case, a programming language can be easy to understand for you but read as complete gibberish to domain experts. A DSL would be much easier for them to digest, because it will be specific for their domain and speak their language.

Now, let's see the three top advantages of using a DSL with respect to a GPL:

They are safer. Fewer things can possibly go wrong when using a DSL. When is the last time you had a Null Pointer Exception when working with HTML or SQL? Never. This is very important if we are doing something critical like dealing with someone’s health or money.
When there are errors, the errors are specific to the domain, so they are easier to understand. Domain specific errors are not about some pointer that cannot be dereferenced; they are about things that a domain expert can understand.
We can teach them more easily: they are limited in scope so less time and less training are needed to master them, simply because there is less stuff to study.

What are the Benefits of Using DSLs?

Domain Specific Languages are not marginal improvements over GPLs. In the right context, they can change how organizations work by having a strong impact on two different levels:

They let you communicate with domain experts. Do you write medical applications? Doctors do not understand when you talk about arrays or for-loops, but if you use a DSL that is about patients, temperature measures, and blood pressure, they could be able to understand it better than you do.
They let you focus on the important concepts. They hide the implementation or the technical details and expose just the information that really matters.

DSLs are tools to organize and express our thoughts in relation to a specific domain. All their advantages derive from this capability. Let's look at them in detail.

Communication with Domain Experts

In many contexts you need to build software together with domain experts who are not themselves developers. For example:

You may build medical applications, and therefore need to communicate with doctors to understand the treatment a companion software should suggest.
You may build marketing automation software. You would need the marketing people to explain how to identify clients matching a certain profile, so you can offer them a particular deal.
You may build software for the automotive industry. You would need to communicate with the engineers to understand how to control the brakes.
You may build software for accountants. You need to represent all the specific tax rules to apply in a given context, and you would need an accountant to explain them to you.

Now, the problem is that these domain experts do not have a background in software development, and the way developers and those domain experts communicate can be very different, because they speak different languages.

Developers talk about software, while domain experts talk about their domain.

By building a DSL, we build a language to communicate between developers and domain experts. This language will be understood by developers, domain experts, and also by the software, which will be able to execute the instructions specified in the DSL.

In some cases, you can give a DSL to domain experts and let them write their queries or logic alone. In practice, however, reaching a point where domain experts can use the DSL autonomously is rare. Typically, a domain expert describes what he wants to a developer, and the developer can immediately write down that description using a DSL. The domain expert could at this point read it and criticize it.

Generally, non-developers do not have the analytical skill to formalize a problem, but they can still read it and understand it, if it is written in a DSL that uses a lingo familiar with the user. Also, these tools can use simulators or run queries on the fly so that the domain expert can look not only at the code itself but also at the result.

These kinds of interactions in practice can have a very short turnaround: code can be written during a meeting or within days. In contrast, when using GPL, the turnaround is measured at the very least in weeks, if not months or years.

Focus and Productivity

The fact that DSLs abstract some technical details to focus on what knowledge they should capture has important consequences.

On the one hand, it make the investments in the code written using DSLs maintain value over time. As the technology changes, you can change the interpreter processing DSL code, but the DSL code can stay the same. An HTML page written 20 years ago can still be opened using devices that no one was able to imagine 20 years ago. In the meantime, the browsers have been completely rewritten multiple times, but the logic can be ported to new technologies.

I want to share a story about a company I worked with. This company created its own DSL to define logic for accounting and tax calculations. They started building this DSL 30 years ago, and at that time they generated console applications. Yes, applications that ran in consoles of 80×25 cells.

I worked with them re-engineering the compiler, and the same code of their DSL is now used to generate reactive web applications. How did this happen? Because the DSL captured only the logic, which was the really valuable part of the programs, and an extremely important asset for the company. The technical details were abstracted in the compiler. So we just had to change the DSL compiler to preserve the value of the logic and make it usable in a more modern context.

This story teaches us that:

Domain logic is what has value and should be preserved, while technology changes over time.

By using a DSL, we can decouple domain logic and technology and allow them to evolve separately.

Another advantage of hiding technical details is productivity. Think about the time spent reasoning about deallocating memory or choosing which implementation of a list would perform best for the case at hand. That time has a poor ROI. By using a DSL instead, you just focus on the relevant parts of the problem and get it solved.

How to Create Domain Specific Languages

You have seen why Domain Specific Languages are so cool and what benefits they can bring you. Now there is only one question left to answer: How do we build DSLs?

What Tools Can We Use to Build Domain Specific Languages?

There are different ways to build a DSL. The goal here is to build a language, with tool support, while keeping the effort reasonable.

We are not building the next Java or C#, so we are not going to pour tens of years into building an extra-complex compiler or an IDE with tons of features. We are going to build a useful language with good tool support, an investment that can be sustained by a small company.

Here, we are going to look at a few alternatives: some specifics on building textual DSLs, and one other that is intended to build graphical languages, or languages based on less common notations. You probably think only about textual languages, but Domain Specific Languages are broader than that.

Textual languages

These are the most classical languages. Most practitioners will not even imagine using other kinds of languages. Admittedly, we are all used to working with textual languages. They are easier to support and can be used in all sort of contexts. However, to use them productively, a specific editor is mandatory. Let's see how to build textual languages and supporting tools.

Xtext is a solid solution to build textual languages, and in many cases it is your best choice.

With Xtext, you define your grammar in a way similar to how you would with ANTLR, but instead of getting just a parser, you get a nice editor. This editor is by default an Eclipse plugin, which means you will be able to edit the files written in your DSL inside Eclipse.

If you know how the Eclipse platform works, you can create an RCP application, i.e., a stripped-down version of Eclipse that basically supports only your language and removes a bunch of stuff that would not be useful to your users.

So Xtext gives you an editor and a parser. This parser produces for you a model of your code using the Eclipse Modeling Framework (EMF), which basically means that you have to study this technology.

I remember the long days reading the EMF book as one of the most mind-numbingly boring experiences I have ever had. I also remember asking questions on the Eclipse forums and not getting any answers. I opened bug reports that I didn’t receive answers for until three years after (I am not joking).

So it was disheartening at first, but over time the community seemed to improve a lot. Right now, the material available on the Xtext website is incomparably better than it used to be, and the book from Lorenzo Bettini helped to make that possible.

The editors generated by Xtext can be deeply customized, if you know what you are doing. You can get away with minor changes with a reasonable effort, but if you want to do advanced stuff, you need to learn the Eclipse internals, which is not easy.

Recently, Xtext escaped the “Eclipse trap” by adding the possibility of generating editors for IntelliJ IDEA and… the web! Like many other developers, I switched to IntelliJ some years ago, and I was missing a way to easily build editors for IntelliJ IDEA.

So this was great news, even if because Xtext had been created to work with Eclipse, the support for IntelliJ IDEA was not as mature and battle-tested as the one for Eclipse.

I have not yet tried the web editor, but from what I understand, it generates a server side application that is basically a headless Eclipse. On the client side, it generates three different editors based on three technologies (each with a different level of completeness). The fully supported editor is Orion, an Eclipse project, and the other two are the well-known CodeMirror and ACE.

You may want to check out this list of projects implemented with Xtext to get an idea of what is possible to achieve using Xtext.

Textual languages: other tools

In the case that Xtext is not a good fit for your textual language, you may want to consider some alternatives. These are the ones I would keep an eye out for:

TextX is a Python framework inspired by Xtext. You can define the grammar of your language with a syntax very, very close to the one used by Xtext. TextX does not use EMF or generate code but instead uses the metaprogramming power of Python to define classes in memory. While it seems nice and easy to use, textX does not generate editor support like Xtext, which is a major difference. If you want to get a better feeling of how textX works, take a look at this video.

There are other tools available, such as Spoofax. I have not used it, so I cannot vouch for it. It is more academic stuff than an industrial-grade language workbench, so I would suggest a bit of caution. Spoofax can be used inside Eclipse, and is based on a set of DSLs to use to create other DSLs.

If you want to look into Spoofax, you may want to look at this free short book from Eelco Visser named Declare Your Language.

Projectional editors

Projectional editors are extremely powerful and exciting, but they are unfamiliar to many users. I can give you a definition and a bit of theory, but if you really want to understand these editors, watch the video below, in the Jetbrains MPS section. You could also take a look at this explanation of projectional editing written by Martin Fowler.

A projectional editor is an editor that shows a projection of the content stored on file. The user interacts with the projection, and the editor translates those interactions into changes to the persisted model. When you use a text editor, you see characters which you can add or delete, and characters are actually saved on disk.

In a projectional editor you could edit tables, diagrams, and even what looks like text, but those changes would be persisted in some format different from what you see on screen. This may be in some XML format, or it may be in a binary format. The point is that you can work with those files only inside their special editor.

If you think about it, this is the case also for all the graphical languages: you see nice pictures, you drag them around, connect lines, and in the end the editor saves some obscure format, not the nice pictures you see on the screen. The big advantage of projectional editors is that they are much more flexible than your typical graphical language. You can combine different notations and support all sort of representation you need for your case.

Confused? That is to be expected. Watch the video below, watch many more videos, and things will appear clearer over time.

Jetbrains MPS

Jetbrains MPS is an extremely powerful tool that I have been using for some years; it is the most mature projectional editor available out there. That is no accident: Jetbrains has invested significantly in developing it over more than a decade.

Want to see what it looks like? Watch the video.

Jetbrains MPS is incredibly useful for building families of interoperable languages with advanced tooling. Imagine using several DSLs to describe the logic of your problems, to define tests, to define documentation. Imagine all sorts of simulators, debuggers, and tools to analyze code coverage. All built on one platform.

Having everything on one platform means you need to be ready to embrace Jetbrains MPS and to invest a significant amount of time in properly learning it. However, if you are ready to make the investment, it can revolutionize your processes.

What Do I Need to Make My DSL Succeed?

There are just two things that will seem obvious but are not:

You need your users to use your DSL
You need your users to get benefits from using your DSL

To achieve these, you will need to build the right tool support and adopt the right skills. We are going to discuss all of this in this section.

Get users to use it

You need to win the support of users. When I was getting my Ph.D., I conducted a survey on the reasons why DSLs are not adopted. There are several causes, but one important factor is resistance from users, especially when they are developers.

If you target a DSL to developers, they may resist because they feel they are not as in control as when they use a General Purpose Language (GPL). They may also fear that a DSL lowers the bar, being simpler to use than, let's say, Java.

Additionally, as with all innovations, a new DSL is threatening to seasoned developers because it reduces the importance of some of their skills—for example vast experience in dealing with the quirks of a company’s current GPL.

If your DSL is intended for non-developers, it is generally easier to win their support. Why? Because you are giving them a superpower: the ability to do something on their own.

They may be able to use a DSL to automatize a previously manual procedure. Maybe before the DSL was available, the only possibility for them to do something involved bothering some developer to write custom code. With a DSL they get more power and independence because of it. Still, they may resist adopting it if they perceive it as too difficult or if they feel it does not match their way of thinking.

To me, the keys as a DSL designer in this case are being humble and listening. Listen to the developers, work on capturing their experience and embedding it in the design of the DSL or the tooling around it. Involve them in the design of the DSL. When talking with your users, technical or not, communicate that the DSL will be a tool for them, designed to support them, and derived from their understanding of the domain at hand. When designing DSLs, the cowboy approach does not work; you need to succeed as a team or not succeed at all.

Give benefits to users

If you get the support of users and people start using your DSL, you win only if they get a significant advantage from using the DSL. We have discussed the importance of a DSL as a communication tool and medium to support co-design. This is vital, but in addition to this, you can significantly increase the productivity of your users by building first-class tool support. A few examples:

A great editor with syntax highlighting and auto completion so that learning the language and using it feels like a breeze
Great error messages: a DSL is a high-level language, and errors can be very significant for users
Simulators: nothing helps users as much as the possibility to interact with a simulator and see the results of what they are writing
Static analysis: in some contexts the possibility to analyze the code and reassure against possible mistakes is a big win

These are a few ideas but more can be adopted, depending on the specific case. Specific tools offer support for specific languages.

Books on Domain Specific Languages

If you are serious about learning DSLs, here is a list of books you could look into.

DSL Engineering by Markus Völter

This PDF version of the book is donation-ware, so you can just read it and donate. Alternatively, you can find the printed version on Amazon.

The book will help you on different levels. First, it is very useful for setting your terminology straight. In addition to that, you will learn some good principles of DSL design. If you do not have direct access to an expert to teach you how to design DSLs, reading this book is the best alternative I can recommend (together with as much practice as you can, of course).

Then comes the part about implementation: remember that Markus has a PhD, but he is first of all someone who gets things done. This part is very well written, with examples based on Xtext, Spoofax, and MPS. Part IV is about scenarios in which DSLs are useful. Given this is based on his large experience in this field, there are a lot of interesting comments.

I’ve had the occasion to work with Markus, and he is simply the best in this field, so if you can learn something from him, do it. Read his books, watch his presentations, follow his projects. It will be a good way to invest your time.

Domain Specific Languages by Martin Fowler

Fowler is a famous thought leader and bestselling author. He writes with clarity, especially about both internal and external DSLs, and the mental models presented in the book are useful and elegant. However, if you find internal DSLs as irrelevant as I do, you may be interested in only some portions of this book.

There are 15 chapters dedicated specifically to external domain specific languages. While those chapters are organized around implementation techniques, there are comments and remarks from which you can learn some design principles. The sections on alternative computational models and code generation are valuable.

You will have a hard time finding an exploration of these topics at this level of detail anywhere else. The book is seven years old, and the techniques may have evolved since the book was written, but the vast majority of the considerations presented in the book are still valid. And of course, they are thoughtful and well explained, as you would expect from Martin Fowler.

Language Implementation Patterns by Terence Parr

If you are interested in textual languages and in particular ANTLR, you should definitely look into this book.

The book starts discussing different parsing algorithms. If you like to learn how stuff works, you should take a look at these chapters.

Then there are chapters about working with the Abstract Syntax Tree, extracting information, and transforming it. This is the kind of stuff you need to learn if you want to become a Language Engineer.

Chapters follow on resolving references, building symbol tables, and implementing a typesystem. These are the foundations to learn how to process the information expressed in your DSL.

Finally, Terence explains how to use the information you have processed by building an interpreter or a code generator. At this point you end your journey, having seen how to build a useful language from start to finish. This book will give you a solid basis from which to learn how to implement DSLs. The only thing missing is a discussion on how to design DSLs, but since that isn’t the goal of this book, it’s not a big deal.

MPS Language Workbench by Fabien Campagne

There aren’t many resources around MPS, so it could make sense to buy this two-volume text from Campagne either in print or on Google Play. They explain in detail all the many features of MPS (admittedly some are a bit obscure).

One thing missing is more advice on language design. These books are very good references to learn how MPS works, but there is not much guidance on how to combine these features to get your results. One reason for that is that MPS is an extremely powerful tool, which can be used in very different ways, so it is not easy to give general directions.

Volume I explains separately the different aspects of a language: how to define the structure (the metamodel), how to define the editors, the behavior, the constraints, the typesystem rules, and so on. Most of the chapters are in reference-manual style (e.g. the chapter “The Structure Aspect” or “Structure In Practice”). Everything you need to learn to get started and build real languages with MPS is explained in Volume I.

Volume II is mostly about the advanced stuff that you can safely ignore at the beginning. I suggest looking into this book only when you feel comfortable with all the topics explained in Volume I. If you have never used MPS before, it will take some time.

Volume II explains how to use the build framework to define complex building configurations, and it gives you an overview of all the different kinds of testing you may want to use for your languages. It also shows you how to define custom aspects for your language or custom persistence.

Implementing Domain-Specific Languages with Xtext and Xtend by Lorenzo Bettini

I’ve reviewed the second edition of this practical and enjoyable book.

If you want to learn how to write textual languages with good tool support, you could start following a couple of tutorials on Xtext and then jump to this book. This book will explain everything you need to know to use Xtext to build rather complex editors for your language. The only caveat is that Xtext is part of a complex ecosystem, so if you really want to become an expert of Xtext, you need to learn EMF and Xtend.

The book does a good job in teaching you what you need to know to get started on these subjects, but you may have to complete your education with other resources when you want to progress.

What I like about this book is that it is not a reference manual, but it contains indications and opinions on topics like Scoping or building typesystem rules (the author has significant experience in this specific topic). Also, the author is interested in best practices, so you will read his take on testing and continuous integration. This is the kind of stuff you should not ignore if you are serious about language engineering.

DSLs in Action by Debasish Ghosh

If you want a gentle introduction to the topic of DSLs in general, this is an interesting book, though it has some problems. Specifically, it focus way too much on internal DSLs which are, as we all know, not the real thing. Plus, they misspelled my name (-1 point for that).

There is not much on external DSLs: the author briefly discusses Xtext and then spends a chapter on using Scala parser combinators to build external DSLs. If you are interested in learning how to implement an external DSL, do not pick this book. However, if you prefer internal DSLs to external DSLs, or if you want to read every available resource on DSLs, this book may be a good choice.

Domain Driven Design by Eric Evans

This is a relevant and important book to read because you need skills to understand a domain in order to represent it in your language and design your language to capture it.

Now, I should probably just praise this book and stress how much I have enjoyed it. Unfortunately, I tend to err on the honesty side, so I warn you: this is one of the most boring books I have ever read. It is important, it is useful, it is great, but it is just so plain and long.

It stayed on my night stand for months. What you should get out of this book is the importance of capturing the domain in all of your software artifacts. The book stresses the importance of building a common language to be shared among the stakeholders. This is completely and absolutely relevant if you want to build Domain Specific Languages. The book does not include how to map this domain model to a language, but it is a good complement to other books specific to DSL design.

DSLs to empower your users

There are many reasons why you should really consider Domain Specific Languages. I have seen companies benefit enormously from DSLs. Most of the people I have worked with DSLs as a key differentiator that helps them increase productivity by 10-20 times, reducing time-to-market and feedback cycles, increasing the longevity of their business logic, and much more.

Aside from the practical benefits, I find the topic extremely fascinating. Most of all, I feel that by building DSLs we build powerful tools that help other people do their job. As language designers we act as enablers; our languages can be used by skilled professionals to achieve great things, and this is an amazing feeling for me.

If you are interested in Domain Specific Languages, you can take a look at an extended version of this article: The Complete Guide to Domain Specific Languages. It contains more examples, a comparison between more tools, tips on building DSLs, and more resources.