YANGHONG

MARSSx86: Multi-core Simulator For x86-based Systems

Introduction

MARSSx86 is a multi-core simulator for x86-based systems. It is based on binary-translation QEMU so it can easily support full-system emulation/simulation. Users can selectively decide when to enter simulation mode for region-of-interest and when to resume emulation for normal operations. I have compared several cycle-accurate simulators, namely gem5, sniper, SESC (SuperESCalar) and MARSSx86.

gem5 supports multi-core simulation but schedules all cores in a sequential style (maybe it's because I don't know how to configure multithread simulation?). sniper is based on dynamic binary translation of Pin, which is slower but more reliable and supports multi-core. But according to my experience, sniper will incure some overhead and conceal most data race cases. Consequently the simulation result of sniper differs greatly from native runs in that data race is rare due to a seeming barrier after every store instruction. SESC uses MIPS ISA and I do not discuss it here. MARSSx86 uses single thread QEMU for normal emulation but switches to fine-grained multi-core scheduling when switching to simulation mode. Thus MARSSx86 is the most satisfying choice for me.

Installation

MARSSx86 is easy to install and the official website has detailed documentations. Just emphasize that remeber to set the number of cores you want to simulate at compilation time. This will be the default for later simulation and not reflected through qemu-system-x86_64 parameters.

Simulation

The scripts provided by the official website assume you are sitting in front of a physical Linux machine with a GUI environment. The default console window provided by QEMU is handy except for me. I usually connect to my non-GUI testbed using ssh and tmux. So the typical script for me is:

./qemu/qemu-system-x86_64 -serial telnet:127.0.0.1:4445,server,nowait \
                          -monitor telnet:127.0.0.1:4444,server,nowait \
                          -m 512 -hda /data03/images/debian6.img
                          -simconfig ~/sim-conf

This script separates serial output and QEMU monitor to separate telnet connections. I can use tmux to easily switch between those windows. The serial connection is for interacting with the virtual machine shell. MARSS instructions can be typed through the monitor interface. The configuration file ~/sim-conf sets some basic things such as -stats, -machine. Another benefit of this script is that outputs are well separated. Output of MARSS will go to the window where you starts the QEMU command while VM shell output goes to 127.0.0.1:4445.

Region Of Interest

To focus on the piece of code we are interested in, please reference Checkpoints in MARSS, though I can't figure out what's the business of checkpoints…

Anyway, the header file marss/ptlsim/tools/ptlcalls.h provides macros which is essentially some backdoor instructions for interacting with MARSS. The important functions are:

/* equivalent to `simconfig -run` */
static inline W64 ptlcall_switch_to_sim(void);

/* equivalent to `simconfig -stop` */
static inline W64 ptlcall_switch_to_native(void);

/* equivalent to `simconfig -kill` */
static inline W64 ptlcall_kill(void);

Use the last one, ptlcall_kill with caution because if you kill MARSS before ptlcall_switch_to_native or simconfig -stop, then you will probably lose your statistics because ptlcall_kill effectively kills QEMU process. But if you don't kill QEMU and execute your prgram once again, the statistics will accumulate since your last execution and be appended to your stats file.

So with the provided backdoor instructions, we can instrument our program source code and collect statistics only when ROI is being executed.

static volatile int global_count;

int main() {
        pthread_t p1, p2;

        printf("Switching to simulation\n");
        ptlcall_switch_to_sim();

        pthread_create(&p1, NULL, thread_fn, NULL);
        pthread_create(&p2, NULL, thread_fn, NULL);

        pthread_join(p2, NULL);
        pthread_join(p1, NULL);

        printf("Stopping simulation\n");
        ptlcall_switch_to_native();

        printf("global_count: %d\n", global_count);
        return 0;
}

Query Statistics

It is important to effectively query the data of interest from tens of thousands of lines of stats. The script

./mstats.py -y --yaml-out -n base_machine::ooo_.\*::thread0::dcache::fence \
            -t user ~/test.stats

extracts fence data from the file, still slow though.

Conclusion

This is the basics about MARSSx86 simulator. Interesting until I have to dig into the hardware simulation code :).

comments powered by Disqus