Game Scripting Languages | Game Coding Complete

You might be surprised to learn that C++ isn't the end-all, be-all language for developing games. C++ is great for implementing flexible and elegant technology. It's also good for speed. But as with any language, C++ has its downsides that can really bug game developers. I continually find that C++ is lacking features that I need to make games. Ultimas of every flavor used game scripts to cache huge amounts of localized text and script events—a cumbersome task in C++. Another feature lacking in C++ is reentrant code—a game script for something like an explosion effect might be played over a long period of time, and it's crazy to launch a thread to perform this simple task.

Game designers and level builders don't necessarily want a C++ compiler on their development machine. A game script exposes only those features that they need to build triggers, spawning points, and cut scenes. Even better, if the scripting language is interpreted rather than compiled, it's easy to get into a development zone. They can play it, hate it, tweak it, and replay it until they've got it the way they want it.

There are other languages besides C++, some of them with features like reentrancy, object oriented programming, and more. If a scripting language doesn't exist with exactly the features you want, you can write your own. The game scripts can be thought of as procedural game data, and can describe anything: mission parameters, screen layouts, or character AI.

Using Scripts to Handle Text

Because scripting languages can vary and they are a little tricky to describe, let's look at an example. My first experience with using game scripting languages was with Martian Dreams, a game based on the Ultima VI technology. This game had a simple scripting language to handle conversations, a central task in role playing games. Role playing games tend to lead the player through a tree-shaped set of choices in conversation. Ultima VI used typed keywords for user input. If a character in the game said something like, "Go into the western mountains to find the magic key," you might expect to find out more by typing the word "mountain" or "key." The Ultima VI scripting language used a flag-based system to manage conversations:

 Character Gwenno {    "Key":       if (GetFlag(ToldAboutKey))       {         Say("Don't you remember? ");       }       Say("The key will open the door to the hidden room in the King's Armory.            There you will find enchanted armor and weapons you can use to            defeat the drunken mountain troll.");       SetFlag(ToldAboutKey);       PopKeyword("Key");       PushKeywords("troll", "armor", "weapons", "enchanted");    "Enchanted":       Say("The spell was cast on the armor long ago..."); }

You can see from this sample that Gwenno can tell you about a magic key and the spell that was placed on the King's armor and weapons. She can even remember that she told you about it. The flag, ToldAboutKey, was set the first time you asked Gwenno about the key. If you asked about it again later the conversation reflected the new game state.

This system was pretty easy to manage, but it was a little too simple. All the active keywords had the same priority, so Gwenno couldn't talk about more than one key. The flags were available to the C code and could be used to affect the game by unlocking passages and such, but more complex interactions between the script and the game engine were impossible. The system created fairly believable interactions with non-player characters, as simple as it was.

One feature that was added to the language by Ultima VIII was a natural support for localization. The different languages were coded in line with the English text, and comments were used to mark any English changes to the localization team. It was accomplished with a #define mechanism, which had the downside that the game scripts had to be compiled once for each language.

Another feature that was added by Ultima VIII was a tight integration with C++ code. The language compiler could read the MAP file output by the linker to figure out where and how to call free functions or even members of C++ classes. Of course, it was very compiler specific.

Event Scripting

Creating scripts to handle text processing can really be useful for certain types of games but what about events? After all, most games are based on the processing of numerous, and often complex, events. When I worked on Ultima VIII, all of the programmers wanted a better way to script complicated events. In the previous conversation with Gwenno, imagine that after she told you about the key that she actually walked over to you and cast a spell that gave you additional protection on your quest for the key. We thought that the team of game designers could use the additional features in the game scripting language to create highly interesting and theatrical performances on the part of the non-player characters.

Since Ultima VIII was based on a real-time multitasker we realized that each action of an NPC could be one of those processes, and the master process of the NPC could be the conversation process. The conversation process could slave off some animations and pick up where it left off after the animations were complete. Let's look at an example:

 Character Gwenno {    "Key":      if (GetFlag(ToldAboutKey))      {        Say("Don't you remember? ");      }      Say("The key will open the door to the hidden room in the King's Armory.           There you will find enchanted armor and weapons you can use to           defeat the drunken mountain troll.");      Say("Let me help you by casting this protection spell. This won't hurt           a bit, just hold still for a minute.");      Avatar.SetInactive();      Pathfind(Avatar, WALK);      CastSpell(SPELL_Protection, Avatar);      Say("All done! The protection spell only lasts a few hours, so hurry on.");      SetFlag(ToldAboutKey);      PopKeyword("Key");      PushKeywords("troll", "armor", "weapons", "enchanted");    "Enchanted":      Say("The spell was cast on the armor long ago...."); }

This implies something you might have suspected about game scripts, especially scripts that define "stage direction," for lack of a better term. They execute at the whim of user input and the processes they launch. In my example the call to Pathfind() doesn't return until the Gwenno character successfully walks to the Avatar's location. Any of these processes might fail. I'm sure of this because a very similar piece of code was flummoxed by Origin QA, who built a small wall of wooden boxes in between Gwenno and the Avatar. Gwenno cast the protection spell anyway and managed to protect the wall of wooden boxes instead of the Avatar.

Your scripting language must be able to detect these problems and handle them in some meaningful way. In my example Pathfind() could return a boolean signaling success for failure. If the Pathfind() process failed we had Gwenno say something appropriate like, "I should really clean this place up first. Would you help?"

Interpretation versus Compilation

Compiling and linking large C++ programs can test the patience of any programmer, especially if you're the kind of programmer who needs to get away from your machine now and then to to play a game of Robotron. It would be nice if fast, new computers were issued the moment compile times averaged longer than 15 minutes. Like you, I too keep dreaming.

Interpreted game scripts take longer to load and they run a lot slower. That's not necessarily a bad thing depending on what your game scripts do. If they mostly control NPC conversations or launch other C++ processes, it's doubtful that your players will notice any performance problems in your interpreter. You'll also get a side benefit from interpretation. Since the script is loaded at run time you should be able to engineer the interpreter to reload the script without restarting the game. A game designer could change the script and rerun it for ultra fast development. If your system can do this, you'll save a lot of time and get a better game as a result.

Compiling and linking game scripts might sound a little alien. It seems that if a game script is going to be compiled and linked why not use C++ and be done with it. Just because a game script is compiled and linked doesn't mean that it doesn't support extra features that aren't included in C++, depending on your language syntax. Compiling is a translation process. The text file that represents your code and algorithms is compiled to pre-create the token stream of operators and operands, data structures, and symbol tables used in complicated languages. Linking takes the results of multiple compiled modules and assigns concrete addresses to external variables declared from module to module.

Strictly speaking, a language interpreter performs this sequence of actions on a line-byline basis. If only a small number of lines of code are executed within a large module, interpretation will perform better than compilation. If the same lines are executed multiple times, compilation will begin to outperform interpretation. Compilation and linking will also be able to support language features like external variable references.

Ultima VIII and the original Ultima IX supported a compiled and linked language.

Rolling Your Own with Lex and Yacc

Assuming your language syntax is nontrivial, you'll want to use some tools to help you define your lexicon and your grammar. There are two classic and excellent tools to help you perform this task:

Lex: A utility that generates C programs to be used in simple lexical analysis of text, or in plain language it matches input streams with a series of regular expressions.
Yacc: An acronym for Yet Another Compiler Compiler. This tool takes as its input a context free grammar and a stream of tokens and is able to break down the input stream into a hierarchical data structure. Each node of the hierarchy corresponds to a statement in the language defined by the grammar. Each statement is processed by a custom piece of code that translates the original token stream into your target language, usually a series of simple statements composed of operators and operands.

Both Lex and Yacc require you to associate a regular expression or grammar fragment with an action written in C. You are still responsible for writing the C code to output something that you can use to interpret the source script into a series of executable or interpretable commands and parameters.

Lex and Yacc are old tools, but they withstand the test of time. You can find numerous good resources on the web—simply search for "Lex Yacc" and you'll find all you need. You can also find an O'Reilly book on the subject: Lex and Yacc by John R. Levine, Tony Mason, and Doug Brown.

Regular Expressions

If you are rusty on regular expressions, now is a good time to bone up. Regular expressions are shorthand strings that stand for longer series of strings. Here are some examples:

 Hello[0123]  matches Hello0,  Hello1.  Hello2, Hello3 and nothing else Hello[A-Z] matches any string from HelloA to HelloZ. Hello[^0-9]  matches any string starting with 'Hello' and not ending in a number. Hello$  matches any input line that ends with 'Hello' [o]+    matches any string containing one or more instances of the letter 'o' [o]*    matches any string containing zero or more instances of the letter 'o'

There is much more to regular expressions than these simple examples. Go take a look at Mastering Regular Expressions by Jeffrey E. F. Freidl (published by O'Reilly & Associates, Inc.).

Lex Example

Lex takes as input a series of regular expressions and C code fragments. Lex takes that input and creates a C source file, lex.yy.c, that will analyze any input file and perform the actions each time it finds a string that conforms with one of the regular expressions. Here is a sample input file for Lex:

 %% [A-Z]   putchar(yytext[0]+'a'-'A'); [ ]+$ [ ]+    putchar(' ');

This example has three regular expressions and their associated actions. The first regular expression matches any upper case character from A to Z. The action converts the character to lower case. The second expression matches any series of blanks appearing at the end of an input line. Since there is no action associated with it, the blanks are stripped and don't appear in the output file. The last regular expression matches any series of multiple blanks. The action replaces the multiple blanks with a single blank.

You can define regular expressions for your scripting language, and convert them to tokens. Take a look at a small portion of the Lex file for parsing ANSI C++:

 D        [0-9] L        [a-zA-Z_] H        [a-fA-F0-9] E        [Ee][+-]?{D}+ FS       (f|F|l|L) IS       (u|U|l|L)* %{ #include <stdio.h> # define AUTO 286 # define BREAK 314 # define CASE 304 # define CHAR 288 # define CONST 296 # define CONTINUE 313 # define CONSTANT 258 # define STRING_LITERAL 259 # define RIGHT_ASSIGN 278 # define LEFT_ASSIGN 277 # define ADD_ASSIGN 275 # define SUB_ASSIGN 276 # define MUL_ASSIGN 272 # define DIV_ASSIGN 273 # define MOD_ASSIGN 274 void count(); %} %% "/*"      { comment(); } "auto"       { count(); return(AUTO); } "break"      { count(); return(BREAK); } "case"       { count(); return(CASE); } "char"       { count(); return(CHAR); } "const"      { count(); return(CONST); } "continue"   { count(); return(CONTINUE); } 0[xX]{H}+{IS}?       { count(); return(CONSTANT); } 0[xX]{H}+{IS}?       { count(); return(CONSTANT); } 0{D}+{IS}?           { count(); return(CONSTANT); } \"(\\.|[^\\"])*\"    { count(); return(STRING_LITERAL); } ">>="      { count(); return(RIGHT_ASSIGN); } "<<="      { count(); return(LEFT_ASSIGN); } "+="       { count(); return(ADD_ASSIGN); } "-="       { count(); return(SUB_ASSIGN); } "*="       { count(); return(MUL_ASSIGN); } "/="       { count(); return(DIV_ASSIGN); } "%="       { count(); return(MOD_ASSIGN); } .          { /* ignore bad characters */ } %% comment() {    char c, c1; loop:    while ((c = input()) != '*' && c != 0)      putchar(c);    if ((c1 = input()) != '/' && c != 0)    {      unput(c1);      goto loop;    }    if (c != 0)      putchar(c1); } void count() {    int i;    for (i = 0; yytext[i] !=  '\0'; i++)      if (yytext[i] == '\n')        column = 0;      else if (yytext[i] == '\t')        column += 8 - (column % 8);      else        column++;    ECHO; }

Each regular expression associates a text string with a token. A token is an atomic and meaningful component of a language. All keywords, operators, constants, and identifiers are tokens. Notice the two functions comment() and count(). They are included verbatim in the Lex output, and handle more complicated parsing of the input stream into tokens. The comment() function simply finds the end of the comment so that it can be ignored. The count() function keeps track of the position of the parser in the input stream so that errors can be reported.

Yacc Example

The stream of tokens is sent on to Yacc, which will analyze the stream and match token sequences with a context free grammar. The grammar defines the structure of the scripting language.

Grammar construction is very tricky. The best reference is the "Dragon Book," Compilers: Principles, Techniques, and Tools by the team of Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Blockade yourself from any outside distractions while you soak this in; it's difficult material. Unless your language script is fairly simple or you can steal a grammar from a well-known language like C++ I suggest you find someone who has done this before.

Constructing a proper LR(1) grammar is as hard as those annoying blacksmith puzzles that seem to require movement into the fourth dimension. If you don't have any idea what an LR(1) grammar is, I rest my case. Go get the Dragon Book and start reading, because I'd be a horrible replacement trying to teach any of that here.

This is a small portion of the Yacc grammar definition file for ANSI C++:

 %token IDENTIFIER CONSTANT STRING_LITERAL SIZEOF %token PTR_OP INC_OP DEC_OP LEFT_OP RIGHT_OP LE_OP GE_OP EQ_OP NE_OP %token AND_OP OR_OP MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN ADD_ASSIGN %token SUB_ASSIGN LEFT_ASSIGN RIGHT_ASSIGN AND_ASSIGN %token XOR_ASSIGN OR_ASSIGN TYPE_NAME %token TYPEDEF EXTERN STATIC AUTO REGISTER %token CHAR SHORT INT LONG SIGNED UNSIGNED FLOAT DOUBLE CONST VOLATILE VOID %token STRUCT UNION ENUM ELIPSIS RANGE %token CASE DEFAULT IF ELSE SWITCH WHILE DO FOR GOTO CONTINUE BREAK RETURN %start file %% statement    : labeled_statement    | compound_statement    | expression_statement    | selection_statement    | iteration_statement    | jump_statement    ; labeled_statement    : identifier ':' statement    | CASE constant_expr ':' statement    | DEFAULT ':' statement    ; compound_statement    : '{' '}'    | '{' statement_list '}'    | '{' declaration_list '}'    | '{' declaration_list statement_list '}'    ; declaration_list    : declaration    | declaration_list declaration    ; statement_list    : statement    | statement_list statement    ; expression_statement    : ';'    | expr ';'    ; selection_statement    : IF '(' expr ')' statement    | IF '(' expr ')' statement ELSE statement    | SWITCH '(' expr ')' statement    ;

Needless to say the real grammar is much larger, but not as large as you'd think. The example I found is only 433 lines of text. This grammar doesn't do anything useful until you add code for each line that creates semantic actions as shown here:

 primary_expr    : identifier             { $$ = findIdent($1); }    | CONSTANT               { $$ = $1; }    | STRING_LITERAL         { $$ = findString($1); }    | '(' expr ')'           { $$ = $2; }    ;

Each grammar statement gets a semantic action, which can be any arbitrary C statement. In the example above, there are two custom functions: findIdent() and findString(). It's your responsibility to write the code for these functions and place them in the Yacc input file. The functions access a custom symbol table that you have to construct yourself. Depending on your language you might require the variables to be declared before they are used just like C++, and if they weren't you'd have to invoke some kind of error mechanism.

Putting It Together

Lex and Yacc are dinosaurs that refuse to die mainly because they are extremely powerful and they work. Even better, the grammars and other input files for every common language in existence is available on the internet for your use. If you wanted a scripting language that looks a lot like PHP but works under your custom semantic actions, it can be yours with a little elbow grease, Lex, and Yacc.

Ultima VIII and Ultima IX's language was defined in this way, and had an extremely complicated grammar. It was fully integrated with C++ classes, debuggable, had support for multiple languages, supported multi-process synchronization, and in many ways had all the features of Java, C++, and concurrent PASCAL. (So now you're probably thinking, "Where can I score a copy? I'm sure EA has it somewhere.")

Gotcha

You might believe that programming a game script is easier than C++, and junior programmers or even newbie level builders will be able to increase their productivity using game scripts. This is a trap. What tends to happen is the game scripting system becomes more powerful and complicated as new features are added during development. By the end of the project their complexity approaches or even exceeds that of C++. The development tools for the game script will fall far short of the compilers and debuggers for common languages. This makes the game scripting job really challenging. If your game depends on complicated game scripts make sure the development tools are up to the task.

Getting a working compiler or interpreter up and running is not the problem. You can have this in a few days. The accompanying support systems and development tools are a mountain of work. Ultima VIII incorporated pre-compilation into the compiler to minimize compile times. It also had a symbolic debugger that could pop up in the game so a game designer could see what was wrong with the script he just wrote. Recovering from semantic errors in compilation is also a difficult and time-consuming proposition for an experienced programmer. If you don't do that, your script compiler will give cryptic and misleading errors, which will waste everyone's time.

The bottom line: Don't underestimate the task of creating your own language for game scripts. Don't do it half-assed, either, or the good people forced to use your lame scripting language will likely have your head on a pike before the game ships—if it ships!

Python and Lua

A somewhat recent addition to scripting languages in computer games has been Python. Python is the scripting language of choice for many computer games. Python is basically an interpreted C++. One of the coolest features in Python is called a generator. A generator is a function that can return control to the calling code and resume where it left off the next time it is called, preserving the value of local variables. This is incredibly useful for scripting events and stage direction in games.

Python is extendable and integratable with other programming languages like C++. Go up to www.python.org to learn more. Some of my colleagues prefer Lua to Python, another scripting language that is much leaner than Python. Learn more about Lua at www.lua.org.