Procedural macros are relatively new. Because of that it’s quite possible you’ll encounter a few compilation errors.

Normally, in Rust, the compiler helps you out effortlessly, but the lack of properly understanding macro hygiëne and token spanning will result in confusing errors anyway.
I took the leap and wrote my own procedural macro, which is now released on crates.io. That macro will be the subject of these posts while I try to explain and provide solutions for compilation errors.
The complete source of this macro can be found on github, if you’re interested.

Ready to go?

Make sure you have the Rust Nightly Toolchain installed, which can be done through the Rustup tool. This post also presumes you have already minimal knowledge of the Rust programming language and the Cargo tool.

A nightly compiler is not necessary per se, but it allows me to show off some cool features which haven’t arived on stable yet.

Starting off

A procedural macro must reside in it’s own crate. This ‘macro-crate’ is only allowed to export macros and that’s, in my opinion, already an unusual constraint. Right off the bat we need to work around this structural limitation by declaring two crates; 1 for all our macros and 1 for dependant types. Let us call them ‘crate M’ and ‘crate T’ respectively.
The crate containing your application code, which we’ll call ‘crate A’, will have both crate M and T as dependancy.
DO NOT make the macro crates dependant on each other or your application crate, these links will result in compilation errors because of circular dependancies. When necessary we can make additional dependancy links inside the expanded code from our macro.

Building a procedural macro crate is quite simple. Inside the Cargo.toml file you add the following code:

[lib]
# Changes the build target
proc-macro = true

Procedural macro template

Throughout this post I’ll use the terms ‘procedural macro’, ‘procedural macro attribute’ and ‘macro method’ so I’d like to take a moment first and explain each of them.
A procedural Macro is essentially a compiler extension which you can write yourself in Rust code. These macros allows for automatic code generation, constraints checking and more. Since it’s “just Rust code” you can make it do anything by utilising the standard library or external crates. Possibilites are endless.
A procedural macro Attribute is a kind of procedural macro which is invoked using a specific syntax. These macros are attributed to your Rust code and will interact with, or rather manipulate, that code.
A macro Method is the method called by the Rust compiler when a procedural macro is resolved (or called). These methods communicate with the Rust compiler and contain the logic of a macro.

For now we’ll only focus on the crate which we marked with proc-macro = true, crate M. We’ll start by declaring and exporting our own procedural macro attribute at the root of that crate. Consider the following code:

#![feature(proc_macro)]

extern crate proc_macro;

use proc_macro::TokenStream;

#[proc_macro_attribute]
pub fn value_from_type(args: TokenStream, input: TokenStream) -> TokenStream {
    println!("[BUILD] Running proc macro");
    TokenStream::empty()
}

That’s a minimal code to get started writing a macro, let’s explain what each line does.
#![feature(proc_macro)] indicates to the compiler that we want to enable the procedural macro feature (dead link).
Note that BOTH the procedural macro crate AND the application crate MUST have this feature enabled for the macro to work.

Following the above link will guide you to the -Unstable Book- containing lots of information about nightly compiler features. It’s a good read for more advanced Rustaceans.

The next lines, starting with extern crate, should be obvious. We explicitly import our dependancy crates to be used in our own code. In this case the crate ‘proc_macro’ is imported, which is made available by the compiler because we’re targetting to build a procedural macro.
The use statement simply re-imports the TokenStream item into our current module.

Our macro is called value_from_type, which matches the identifier of the method exactly. This is important because there is NO other way to name our procedural macro.
It’s attribute #[proc_macro_attribute] registers the method, so it’s called when the attribute is resolved.
The method takes two arguments, both typed TokenStream, which is an agreed upon interface for each procedural macro attribute. args contains the arguments passed into the attribute call. input contains code from the subject, this is the code attached to the attribute.

This macro method does exactly one thing, printing to the standard outstream when it’s executed. You’ll see this message printed by the compiler, on the standard outstream, when the procedural macro attribute is used within application code.

Attribute arguments

Procedural macro attributes are attached to Items. Note that Item is used here in syntactic context. A Struct, Enum or Method are Items, even a Module like the example below.
When attaching an attribute to an Item you can pass it one or more arguments. The arguments could be anything actually because we’re responsible of parsing them ourselves.
Taking it to the extreme we could even write Rust code as an argument to our attribute macro!

// Here we are calling the macro value_from_type with arguments `arg1, arg2, arg3`
#[value_from_type(arg1, arg2, arg3)]
mod item_mod {}

// Here we are calling the macro value_from_type with arguments `"1=Monkey | name=Barry"`
// 
// Note the difference in attribute placement, this form is equivalent to the previous example
// but here we used an INNER ATTRIBUTE.
mod item_mod_2 {
	// Note '#!' at the start of the line
    #![value_from_type(1=Monkey | name=Barry)]
}

As you can see, anything can be passed down as argument, as long as you are able to parse it.

Attribute input

The input of a procedural macro attribute is the Item attached to it. The compiler will check for syntax errors and pass the entire Item into the macro method. This means that the original code is effectively removed from the file during compilation.
It’s expected from the attribute macro to transform the input and return a some output back into the compiler. Most of the time it’s desired that the input is copied into the output.

There is ONE important detail; invoked macro attributes are removed from the input! This is important to make sure that the macro expansion process is finite.

// Our attribute value_from_type is linked to Item item_mod
#[value_from_type()]
// The entire module code, including the mod keyword and braces, is passed 
// into the macro value_from_type as input
// <--
    mod item_mod {
        struct Test;
    }
// -->

This example shows what’s effectively being passed into the macro as input. Note that this scenario is similar when using inner attributes.

Procedural macro usage

Let’s continue with the provided macro method example since I haven’t explained what TokenStream::empty() means.
As mentioned earlier the compiler will pass down the code of the Item attached to the attribute being executed as input. The macro method is expected to return Rust code to replace what was originally passed as input argument.
As for the example, it’s actually returning nothing (= no code). This results in all attributed code to disappear during compilation.

“But what is a TokenStream exactly?”, you might ask. To explain this I need to start with how modern compilers work. As you might have figured there are a lot of steps necessary to transform some code into machine instructions. To not reinvent the wheel every time another language or system platform is built modern compilers are seperated into a ‘frontend’ and ‘backend’ system. The frontend is responsible for processing the programming language and transforming it into some intermediate format. The backend will transform that intermediate format into the desired machine code. All components of the compiler are free to analyze and manipulate it’s data, but operations performed on the intermediate code representation can be in every context easily reused.
The Rust compiler (rustc) is actually a frontend of the compiler system, with LLVM as as backend component for generating machine code. The intermediate code representation is LLVM IR in this case.

Back to TokenStream then, it’s basically an intermediate representation between rustc and procedural macros. This approach of processing and returning TokenStreams was chosen because rustc internally makes use of these tokens because they represent the Abstract Syntax Tree (AST). The AST-data is produced by parsing Rust code.
Exposing an intermediate representation allows the rustc developers to limit exposure of the compiler internals to procedural macros. Allowing procedural macros to straight up use internal compiler code would constraint developers from making breaking changes to the compiler. This would break all existing procedural macros.

Building the macro

The next step is to actually write code inside the macro method. Of course we want our macro attribute to do something useful, that’s why in the next post I’ll describe how and why my value_from_type macro works.

Happy hacking!

Contents

Creating a procedural macro in Rust Pt. 1