Abstraction in programming: Taming the ones and zeros

We can now accomplish some pretty incredible things with technology. What had once seemed wildly futuristic is now becoming reality.

Say, for example, you wanted to develop a smart home system that would open and close your windows when certain conditions were present. You would need to equip your windows with temperature and moisture sensors and then go about programming the system, so the windows would adjust according to the weather. However, simply telling the system to open the windows when it’s pleasantly warm and close the windows when it’s raining heavily wouldn’t work. These instructions leave far too much open to interpretation. The system would need very specific input, such as temperature thresholds, exact moisture levels, etc., to perform properly. The same goes for any programmed system.

When looking at modern applications, systems and capabilities, it’s hard to believe that to work properly, all the programming that goes into them still has to be rendered into bits and bytes composed into strings of binary code. From the coolest looking smartphone app, to the most sophisticated enterprise software, and even what seem like futuristic technologies, such as smart home features and autonomous vehicles — all require their instructions to be delivered in binary.

Why is this? Computers don’t work well with ambiguity and nuance. Binary provides the completely unambiguous instructions of either “off” (zero) or “on” (one). They use these simple binary states as the basis for logical computations, which render the computer’s circuits as either “on” or “off.” These simple circuits are used to create logic gates (for example, AND, OR, and NOT), which allow the programmer to create operations and manipulate data in a variety of ways. This is then duplicated billions of times to create modern CPUs.

This kind of unambiguous input is created with a system that offers only two options: yes or no, on or off, one or zero. Though, just because the final input has to be configured in a way that allows the machine to process it doesn’t mean that we as humans have to completely adapt to the machine’s way of “thinking.” If we were forced to enter instructions only in binary formats, programming would be incredibly tedious, and the science and practice of computing may never have reached the level we see today.

What is abstraction in programming?

Abstraction, in a programming context, aims to hide as much complexity as possible to allow programmers to focus on what’s most important and relevant. Abstraction is used to camouflage much of what is vital to making a program work, but is innately complex due to machines’ requirements for binary input. Hiding the complexity, however, should not in any way negatively impact the power and versatility of the machine and its output. To the contrary, adding abstraction layers should result in more robust and useful outputs.

Abstraction allows programmers to focus on what they want to accomplish with their programs, rather than all the individual steps needed to get there.

This concept is not unique to programming. Abstractions are common in other areas of our lives. Starting your car is one obvious example. Modern cars can be started by turning a key, pushing a button or simply standing near your car with a fob in your pocket. All the mechanical steps required to bring the car to life have been abstracted to simplify the driving experience.

The need for abstraction.

In the early days of electronic computing, programming was often done using cards or long rolls of paper that were punched with holes. These holes were patterned to represent the binary strings that would be fed into the computer.

Amazingly enough, this form of programming was actually invented in the early 1800s by a French weaver named Joseph-Marie Jacquard, who used punched cards to direct threads in a loom to create intricately colored woven textiles.

After computations were completed, the output would be generated in the same way the input was fed to the machine — on cards or paper rolls full of holes, which aren’t necessarily easy for a person to decipher. A further step was therefore necessary, using yet another device to decode the output rendered by the computer into a human-readable format. This was a very early example of abstraction.

Humanizing machine code

Machine code, also known as machine language or object code, is the binary input that machines require to generate output. But people can’t be expected to express themselves in terms of ones and zeros when writing programs. How has the field of software development bridged this gap? By addressing both sides of the issue: Giving developers a way to program using more natural language, and providing a way to translate this language into a format that a machine can use.

The second half of this equation involves using a language processor, such as a compiler, interpreter or assembler, to translate the code written by the programmer into a format the machine can process. Assemblers are generally used to translate the so-called low-level (assembly) languages into object code, while compilers are used when programming in high-level languages. Interpreters translate a single line of code and execute that line before moving on to the next. Any errors found along the way will halt the entire process until the error is corrected.

The first half of the equation, however, is a story of ever-increasing abstraction. Each generation has brought a higher level of abstraction from low-level to high-level languages to bring more intuitive language into programming.

Generations of programming languages

As mentioned above, programming languages can be broken down into high-level languages and low-level languages. An even more granular division is generations. The first two generations consist of low-level languages, while the third through fifth generations are populated by high-level languages. Each successive generation represents an evolution toward using more natural language, which is accomplished by adding layers of abstraction.

There are currently five generations of programming languages, but the fifth generation, known as 5GL, is still very much a work in progress and is mainly used in artificial intelligence (AI) research. The evolution from generations one to four, however, provides a comprehensive illustration of how much abstraction has changed the way programmers work.

First-generation languages (1GL)

This language group consists of the machine code required by the hardware to generate its output. It is completely binary, with only ones and zeros providing the instruction directly to the computer CPU to carry out its computations. Though a 1GL program would be incredibly tedious to write, if it’s error-free, it would run very fast, as there would be no additional translation necessary from the coding language to the language the computer can process. This way of programming, however, comes with more challenges than advantages:

As a means of storing and manipulating data, binary remains the foundation of computing today. Each subsequent language generation, therefore, has had to incorporate more and more abstraction to enable developers to think and work more like people and less like machines. The aim with each evolution is to still maintain the ability to work with the data with as much efficiency as possible when programming with binary.

Second-generation languages (2GL)

Though still considered low-level languages, 2GLs are known as assembly languages. This is the first generation of programming languages that started to address the programmers’ need for having a more natural language method of programming, while still satisfying machines’ need for binary input. 2GLs do this through the use of assemblers, which translate programmers’ input into binary that can then be processed by a machine. This marked an important shift in programming by placing more emphasis on the human side of computing.

2GLs provide the same ability to store, locate and manipulate data as 1GLs. However, instead of using all ones and zeros, 2GLs use mnemonics — character combinations that represent operations, such as MOV for move and SUB for subtract — to instruct the machine as to what operation is to be performed. Syntax determines what the operations are acting on, such as a memory location (MOV can move data from one place within the computer’s memory to another) or a numeric constant (SUB allows you to subtract one numeric constant from another).

Since 2GLs use character combinations that are more recognizable by people, writing, modifying, and debugging programs can be done much easier than when programming with a 1GL. Though a step in the right direction, portability and limited practical use were still issues for 2GLs. Even though it is somewhat more natural than the ones and zeros of 1GLs, 2GLs still required a tremendous amount of focus on the minutiae involved in achieving the desired output.

Third-generation languages (3GL)

While the first two generations of programming languages were highly machine-dependent, 3GLs, which are sometimes referred to as mid-level and sometimes high-level languages, can run on different machines, which is in and of itself a major innovation. From the perspective of abstraction, there’s much more going on.

3GLs don’t replace the assembly and machine languages that came before, but are built on top of them with additional layers of abstraction to reflect more natural language use. Low-level languages focus more on bits and bytes, where you have to expressly instruct the machine to locate or relocate every piece of data, signify what type of data it is through the use of syntax and include instructions on what needs to happen to that data under which circumstances.

You could say that this approach is highly computational, which forces the developer to focus mainly on individual tasks the machines must carry out. 3GLs are the first step toward allowing programmers to tackle larger and more diverse requirements, as they do when programming enterprise applications. And they can do so with less complex, or simply less, code. It’s also not just the actual language and syntax that’s being abstracted at this level. Some 3GLs take care of other programming issues that before were very manual, such as removing unused objects clogging up memory or providing template libraries and other tools with tested code blocks that are ready to use. Examples of 3GLs include COBOL, BASIC, Fortran, C, C++, Java, and Pascal. Though 3GLs utilize entire words and statements formed by those words, which is a considerable step forward from 2GLs, they are still very many procedural languages that require explicit instructions for every step involved in carrying out a task.

Fourth-generation languages (4GL)

As is now clear from the evolution up to this point, 4GLs are the next step in making programming code less about scripting instructions for machines to carry out individual tasks, and more about using language to define the desired results of the program. This is what is known as declarative programming, and it differs from imperative programming, which focuses on the means to the end, rather than the end itself. Think of something as simple as printing a document. At most, a user has to click a “print” button, choose which printer should carry out the task, and it’s done. This is a declarative action. An imperative approach to printing would require you to tell the machine exactly where the item to be printed is located in the machine’s memory, how it should be spooled, where to place it in the queue with other jobs, etc.

4GLs allow programmers to focus on what they want to accomplish with their programs, rather than all the individual steps needed to get there.

One of the most frequently used illustrations of how abstraction has simplified programming is the “Hello World” program. A simple program to display the words “Hello World” necessitates 56 lines of code in an assembly language. In Ruby, a 4GL, the instructions can be given with a single line.

With this increased level of abstraction, 4GLs offer a much broader variety of uses than their predecessors. 4GLs are usually not general purpose languages, but rather specialized languages that can be used to query databases, generate reports, automate tasks, manage data, create applications and graphical user interfaces (GUIs), build websites, and so much more.

It could be argued that the simplicity achieved through this level of abstraction comes with a tradeoff in performance. But the fact that 4GLs are more accessible than earlier languages means a wider pool of potential users now can innovate with technology that was previously unavailable to them. The gains from unlocking a broader range of human creativity more than make up for any diminishment in performance.

From scripting to dragging and dropping: Abstraction through low-code

Until recently, the incremental improvements brought about by each layer of abstraction have mostly shifted toward using more words, abbreviations and syntax that people can understand, rather than the machine code that the machines can process. Low-code, though also categorized as a 4GL, takes this abstraction a step further and enables developers to assign functionality to their programs with very little coding.

Low-code can perhaps be more accurately described as “hidden code” due to the level of abstraction in low-code application development platform tools.

Instead of code, low-code platforms have a visual GUI that allows developers to manipulate drag-and-drop components to deliver the desired outcomes. These components come pre-configured and can be used and reused for operations such as calling, storing, or editing data; creating integrations with other applications; displaying user interfaces; sending notifications; or many other capabilities required in modern digital workflows.

With low-code, developers can still access the underlying code to create any custom programming they need, but the heightened level of abstraction allows them to breeze through the design and build process of most of the basic functionality. This gives programmers back valuable time during the development process to really focus on what is most essential, which is precisely what every new language and language generation has set out to do with each new abstraction layer.

Low-code also allows individuals to build larger, process-driven applications that would typically require a team of high-code developers versed in the text-based languages, such as Java, Python, C++, Ruby, and SQL. Since many of the common low-code application patterns are pre-built and part of the platform, developers only need to tell the application what to do. This can include commands like “retrieve emails from Outlook” — and never how to do it, making low-code one of the most hyper-declarative methods of programming available.

For example, building an enterprise case management application with a high-code 4GL can take months. With a 3GL, it could take years, but with low-code, a developer can build an enterprise-grade application in weeks or even days. What’s more, even non-professional developers, the so-called citizen developers, can learn to program with low-code and build useful enterprise applications without having to master the deeper knowledge needed when writing applications with traditional high-code languages.

What is the future of abstraction in programming?

Based on what’s come before, we can make some educated guesses as to what forms programming abstraction will take in the future. To make programmers’ lives even easier, future abstractions will have to:

Whatever the format and mechanics of future types of abstractions, they will certainly see their share of both proponents and detractors. Programmers have historically been reluctant to accept higher levels of abstraction due to the perceived loss of control over the actual computer input — below all the abstraction. The fact remains that we are now relying on machines to do so much more than they were able to do when programming was done in machine code.

Abstractions, while hiding complexity and creating more distance to that machine code, are helping programmers get closer to the actual problems they’re trying to solve. There is still a tremendous amount of art and skill involved in creating programs, even when most or all of the code has been abstracted away. But abstraction allows programmers to focus that art and skill where it’s most required, rather than exhausting it in the plotting of each one and zero it takes to get the job done.

This is precisely where low-code shines. With low-code, developers can focus purely on solving the business problems that plague their non-IT business partners, instead of wrestling with the various high-code concerns. Low-code is the logical next step in the evolution of abstraction, but it most certainly is not the last.

Susan Coleman is the content marketing manager at Appian.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!