This happens to be a question in every entry-level programming interview, and the answer seems very simple on the surface. But you know the interviewers, they don’t stay at the surface but go deep into everything to screw your happiness. Let us try to do a bit of analysis around what is the real deal about compiler vs interpreter.
If we go back in history, there were only pure compilers and pure interpreters to start with. A pure-play compiler takes the source code and converts it into the machine code, which is a set of instructions in binary format, for the CPU to perform tasks.
An interpreter, on the other hand, reads the source code at run time, line by line, and translates the code into machine code, for the CPU to perform the tasks.
But above is history.
The improvements in compilers and interpreters over the period of time have provided multiple variants of both, with overlapping features. So, let us look beyond the simplistic age-old definition and understand modern-day compiler and interpreter implementations.
What is a Compiler?
Compiling is a very broad term and the output of compilation varies depending on the use case. Compiling the code simply means you are translating your code from one language or form to another language or form. The output doesn’t necessarily have to be the final machine code, it can be something in between.
This also means that there can be multiple compilers for a single language, and you would use one of the compilers depending on your use case.
You would also find specifically designed converters that convert the code of one language to another. For example, there is a Java converter that converts legacy Java code to modern Kotlin code.
The distinction between converters and compilers is also very blurry, at a high level, we can call converters as a subset of code compilers.
What is an Interpreter?
As mentioned above, the interpreter reads the source code line by line. Interpreters are also very specifically designed to read different input formats. For example, an interpreter can be designed to read source code, or bytecode, scripts, or whatever else. The final output of the interpreter, however, is always the machine code or CPU instructions as we call it.
So, basis the above, clear differences between compiler and interpreter are as below –
- An interpreter always produces machine instructions irrespective of the input format, a compiler on the other hand produces the output format for which it is designed for.
- An interpreter reads the provided input line by line and produces machine instructions on the fly, a compiler, on the other hand, would compile the entire code and generate machine instructions or any other output format for later use.
- Another point is, compiler’s output is stored on the hard disk, whereas the interpreter’s output is used on the fly by the processor and not stored on the hard disk.
Also note that when the compiler compiles the whole code, it does a lot more than just converting the code, to ensure the best performance at run time. The interpreter on the other hand executes code line by line and hence there is no opportunity to optimize the code. For that reason, languages that do not utilize compilers to generate pre-compiled machine code tend to be slow at run time.
But, there is another option, in between compiler and interpreter –
Just in Time compiler (JIT)
In this scenario, the code is compiled to an intermediate format, say bytecode, but instead of the interpreter reading the bytecode line by line, the JIT compiles the code just in time to machine instructions. JIT improves performance at run time and speeds up the system.
Also, Note –
Machine code is CPU or processor-specific. The output of interpreters is not universal, rather you will need a different interpreter for the Intel processor and a different one for the ARM processor.
Compiler vs Interpreter Examples –
To understand the differences even better, we will pick up some of the popular programming languages, and see what are the various options utilized to reach the final state, which is machine code.
Let us start with C language
C – Compiler generates Machine Code
Let us take an example of a C program. In a typical scenario, the C compiler directly generates machine code. The compiler does all the processing in between, which includes –
- Preprocessing – To remove comments, expanding includes files, expanding macros.
- Compiling -This converts pre-processed source code to assembly language
- Assembly – This converts assembly code to the binary code, thousands of 0s and 1s
- Linking – Linking combines modules into one file. Linking is also of two types, static and dynamic
If you look closely at the above, the developer writes source code, and the compiler compiles the source code following the above four steps and generates machine code.
Also, note that, once the source code is compiled, it is no longer required by the CPU/machine to execute the program. Also, since the machine code is generated beforehand and is available for the CPU for execution, the execution of the program is much faster and the applications perform better.
Java – Compiler generates Bytecode
Let us look at the execution of the Java code execution cycle. Java compiler compiles the source code to bytecode, please note that bytecode is not machine code and requires further processing.
Also, note that Bytecode is platform-independent and that is what makes Java a platform-independent language. You write the code once and execute it everywhere. But we still need something platform-specific, and that is where Java Virtual Machine (JVM) comes into the picture.
You need platform-specific JVM to read bytecode and provide machine code as output. Without JVM, the code will not execute. JVM picks up bytecode line by line and interprets it to machine code. This also means that JVM acts more like an interpreter for bytecode which in turn is compiled from the source code.
Modern JVMs also implement Just in Time (JIT) compiler that compiles the Java bytecode into machine code prior to execution. This increases the efficiency of code execution and makes java programs performant.
Python – Just Interpreter or Compiler too?
We will keep it short! Python mostly follows the pattern of java and its code is also compiled first to bytecode. The big difference is that it is done automatically behind the scene where python creates a .pyc file. The .py file written by the developer contains the source code and the auto-generated .pyc file that contains the bytecode.
Similar to Java, Python has Python Virtual Machine (PVM), PVM interprets bytecode to the machine code at runtime. There are options to implement Just in time compiler as well.
Moreover, Python offers multiple implementations like Jython and IronPython. In case of Jython, the .py code is compiled to bytecode that runs on JVM instead of PVM (default for Python). IronPython on the other hand utilizes .Net and runs Python in the .Net environment.
I hope you got some insights into the compiler vs interpreter debate and understood the differences and similarities. And also got a fair idea about the JIT and code converters.
Thanks for reading, please share your views via comments!