huozhi.im

About Compilation

Slides

Talk About Compilation

Building a hello world C program may be the first time for us to hear about Compiling. After we have known deeper about a sort of the conception, compiling became much more unapproachable. Only nerds understand what does a compiler do during compilation. When I was senior grade in college, the course fundamentals of compiling did frightened me a lot.

Recently, I found the project called the-super-tiny-compiler. That's so awesome that it telling you the basic concept and steps of compiler stuff.

Structure of Compiler

Having a look of the picture below, these parts forms a compiler: the front-end such as tokenizer,the middle state AST, the backend which responses for syntax analysis and code generation.

compiler-construction

the-super-tiny-compiler has four functions.

  1. parse the source code from string format into tokens.
  2. parse from tokens into AST structure.
  3. transform AST. (code generation preparing)
  4. generate the final code.

you can easily know about steps during the compilation by reading its source code, with well explaining comments.

If you have developed front-end apps around year 2016, you may know babel well. Babel is a famous compiler for JavaScript, which support multiple features to let you parse es6, jsx, flow syntax like code. And it will do the code transform to ensure the final code could be working in environment you're expected.

Let's Talk About Babel

Babel has a large ecosystem providing syntax presets, specific transforming plugins and also the parser for users. babylon is the basic parser working background of babel. You can access the AST of the code by parsing it through babylon. That is really easy, just follow the tutorial in babel-plugin-handbook.

The most mysterious thing is what does AST look like. For babylon syntax tree format. A simple function like

function square(n) { return n * n }

will just be parsed as:

- FunctionDeclaration:
  - id:
    - Identifier:
      - name: square
  - params [1]
    - Identifier
      - name: n
  - body:
    - BlockStatement
      - body [1]
        - ReturnStatement
          - argument
            - BinaryExpression
              - operator: *
              - left
                - Identifier
                  - name: n
              - right
                - Identifier
                  - name: n

See it? It's easy. A huge object that you could traverse it by depth-first-traversal method. It's just a tree, representing the priority of each node been parsed. Same level nodes have same priority and deeper get smaller priority.

What about the whole compilation steps? I wrote a demo.

// babel tool set
const babylon = require('babylon')
const t = require('babel-types')
const traverse = require('babel-traverse').default
const generate = require('babel-generator').default

// source code
const jsxCode = `
<div>
  <img src="a.jpg" />
  <span>hello</span>
</div>
`

// parse ast
const ast = babylon.parse(jsxCode, {plugins: ['jsx']})

// traversal transform
traverse(ast, {
  enter(path) {
    const {node} = path
    if (node.type === 'JSXIdentifier') {
      if (node.name === 'div') {
        node.name = 'View'
      } else if (['span', 'p', 'h1', 'h2', 'h3'].includes(node.name)) {
        node.name = 'Text'
      }
    }
  }
})

// code generation
const gened = generate(ast, null, jsxCode)

// result
console.log(gened.code)

What I want to do is transforming a simple ReactJS code into ReactNative like code. But without replacing the keywords manually, I would like to use babel toolset. Let's begin.

For above you could see that I gave the source JSX code and I wanna parse it into babylon AST, finally I generate the code with babel itself.

The work including four parts:

  1. Use babel-types to access the specific nodes you would like to do transformation.
  2. Use babylon to parse code int AST.
  3. Use babel-traverse to traversal whole AST and do some transformation.
  4. Output final code with babel-generator.

These libraries are famous in babel ecosystem. They are your armory and weapons the split code structurely. If you want to learn more, just have a deep look at babel-handbook repo.

Fow now, you just finished the same work that "the-super-tiny-compiler" had done. Parsing, Travsal, Transforming, Generating. I like it very much 'cause it's been much easier for me to understand the problem since I learned how to compiling a C++ program. No more asm middle state code, just from input to output.

Extensions

After reading this post and the source code of tiny compiler, you might have many ideas to do transforming in your field. Let me give an example, I'm now a front-end developer. I write advanced CSS very day. People may say out some key words like "SASS, PostCSS", the tools to let people write less vanilla CSS. How does it work? From recognize the complex syntax then output the pure css code.

Think above. You'll find out the same routine to get the shit work. Advanced CSS mean another syntax of CSS, like es6 to es5. What you need is a syntax parser with ability to construct CSS like syntax AST, and generate code from the rules given.

Let's see an example.

Given a CSS source code below, written in vanilla CSS syntax. What about translating it into ReactNative stylesheets, Ha?

/* ItemActions */
.action {
  margin-top: 10px;
  margin-right: -25px;
  color: #8e8e8e;
}

.button {
  margin-right: 25px;
}

.icon {
  margin-left: -3px;
  fill: rgb(192, 192, 192);
}

.number {
  margin-left: -.3em;
}

We depend on a parser module called css, you can find it on npm.

const fs = require('fs')
const {parse} = require('css')
const log = console.log

const cssFilename = process.argv[2]

const css = fs.readFileSync(cssFilename, 'utf8')

// parse content to AST, including tokenizer
const ast = parse(css)

const rules = ast.stylesheet.rules.filter(rule => rule.type === 'rule')

function camelCase(str) {
  return str.replace(/-([a-z])/g, (match, p1) => p1.toUpperCase())
}

function letterCase(str) {
  const camel = camelCase(str)
  return camel[0].toUpperCase() + camel.slice(1)
}

function renameSelectorToKey(selector) {
  return letterCase(selector.replace(/\.|#/g, ''))
}

function transformValue(value) {
  if (/^-?(0?\.)?[0-9]+/.test(value)) {
    const matches = /(-?(0?\.)?\d+)(px|em)/g.exec(value)
    return parseFloat(matches[0])
  }
  return value
}

const ruleMap = rules.map(rule => {
  return {
    name: renameSelectorToKey(rule.selectors[0]),
    declarations: rule.declarations.reduce((m, declare) => {
      m[camelCase(declare.property)] = transformValue(declare.value)
      return m
    }, {}),
  }
})

// transform AST

const styleSheet = ruleMap.reduce((m, rule) => {
  m[rule.name] = Object.assign({}, rule.declarations)
  return m
}, {})

// Code Generation

const styleTemplate = (styleSheet) => {
  const content = JSON.stringify(styleSheet, null, 2)
    .replace(/\"([^(\")"]+)\":/g,"$1:")
    .replace(/\"/g, '\'')
  return `module.exports = require(\'react-native\').StyleSheet.create(${content})`
}

log(styleTemplate(styleSheet))

finally we got

module.exports = require('react-native').StyleSheet.create({
  action: {...},
  button: {...},
  icon: {...},
})

That is what we expected after a compilation progress.

Conclusion

You may know a little more about compilation after the two examples of compiling JSX and CSS. The concept could just be used everywhere. Human use complied code to talk with machines, write easier code, and so on. Just keep the knowledge, it may helps you a lot. :)