Why are compilers hard to build




















But in many cases , semantic analysis and additional code-generation mechanisms will be required to cope with language gaps e. The statement was probably about compilers which compile to machine code. If you include compilers which compile one high level language to another it might be a lot easier.

Net is probably relatively easy. But compiling a high-level language to efficient machine code is non-trivial. Many high-level languages do as you suggest, compile to C and then use an existing C compiler to generate machine code. This is exactly to avoid the complexity of compiling to machine code and instead reuse an existing compiler. Furthermore C compilers exist for a lot of architectures, so you get a wide reach "for free".

Platforms like. Compared to this, writing an interpreter is relatively easy, because you can use features in the underlying language. So writing an interpreter can be fairly simple. Interpreter or compiler, you have to analyse the source code and figure out what each statement means. In some languages the effort for that is significant. And in these languages, while writing an interpreter might be cheaper, it isn't that much cheaper.

But you don't just want "an interpreter" or "a compiler". You want something that runs code quickly. With an interpreter, the speed at which you run code is severely limited, unless the language works at a very high level. Even the simplest compiler will produce code that will require enormous effort in an interpreter.

So if you have a requirement for speed, I would conjecture that whatever your requirements are, a compiler will get to those speed requirements cheaper than an interpreter.

Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group.

Create a free Team What is Teams? Learn more. Is it really always easier to write an interpreter than a compiler? Ask Question. Asked 2 years ago. Active 2 years ago. Viewed 2k times. Improve this question. Ben Ben 25 2 2 bronze badges. Some recent experience hsa provided me insight as to why this is the case and proved quite interesting! I recently completed work on a new big feature in ShipReq. I found this piece of work an order of magnitude harder than that.

What is the feature? That might not sound like it relates to writing a compiler. Nor do requirements, bug reports, project documents seem very relatable to code. But there are surprising similarities. If you squint a bit, source code is kind of like that too.

When we start considering relationships like imports and methods calling other methods, then it becomes many-to-many too and we get a cyclic graph. Secondly, this new feature in ShipReq is very configurable by users. Obviously having users need to reconfigure everything all the time would be a terrible burden so we provide pre-configured complete setups with best-practices which users can optionally use, and can reconfigure without restriction to best serve their needs.

What that means from a coding point of view, is that we never know the automation rules at compile-time. The code needs to understand and apply all the rules at runtime, and it needs to be able to recognise and handle data and logic combinations that invalidate invariants. This too is like writing a compiler. Where as ShipReq might have a UI page for config, in Java you have modifier keywords such that public class Hello is different than private class Hello.

From the perspective of writing a compiler, these are runtime values with runtime rules. When we consider additional keywords like final and abstract we start getting into what I meant about combinational explosions. The Java compiler, at runtime, has to ensure all possible combinations are valid. For example, it will reject the following code:.

Imagine you were writing your own Java compiler. The private and sealed keywords have been in Scala forever and someone only just recently came up with the realisation that private classes are effectively sealed. For features there are:. Type-safety is a pillar that I depend on all the time.

I mean more the type systems in languages like Scala and Haskell , think features like existential and dependent types. They add value! At compile-time, every runtime value is acceptable because there always exists a configuration that would allow each possible value to be legal. Your compiler will produce x86 assembly. We can ignore the. The most important point here is that when a function returns, the EAX register 5 will contain its return value. The only thing that can change in the snippet of assembly above is the return value.

So one very simple approach would be to use a regular expression to extract the return value from the source code, then plug it into the assembly. The lexer also called the scanner or tokenizer is the phase of the compiler that breaks up a string the source code into a list of tokens. A token is the smallest unit the parser can understand - if a program is like a paragraph, tokens are like individual words.

Many tokens are individual words, separated by whitespace. Variable names, keywords, and constants, and punctuation like braces are all examples of tokens. Note that some tokens have a value e. Also note that there are no whitespace tokens. In some languages, like Python, whitespace is significant and you do need tokens to represent it.

Here are all the tokens your lexer needs to recognize, and the regular expression defining each of them:. Write a lex function that accepts a file and returns a list of tokens. It should work for all stage 1 examples in the test suite, including the invalid ones. The invalid examples should raise errors in the parser, not the lexer. To keep things simple, we only lex decimal integers. If you like, you can extend your lexer to handle octal and hex integers too.

It just has a negation operator, which can be applied to positive integers. The next step is transforming our list of tokens into an abstract syntax tree. An AST is one way to represent the structure of a program. In most programming languages, language constructs like conditionals and function declarations are made up of simpler constructs, like variables and constants. ASTs capture this relationship; the root of the AST will be the entire program, and each node will have children representing its constituent parts.

It will have three children:. Each of these components can be broken down further. The if body, on the other hand, can have an arbitrary number of children - each statement is a child node.



0コメント

  • 1000 / 1000