Skip to content
forked from coloroco/FAME-Z

A Gen-Z fabric emulation environment built on QEMU and IVSHMEM.

License

Notifications You must be signed in to change notification settings

zvolchak/FAME-Z

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gen-Z is a new memory-semantic fabric designed as the glue for constructing exascale computing. It is an open specification evolved from the fabric used in The Machine from Hewlett Packard Enterprise. Such fabrics allow "wide-area" connectivity of computing resources such as CPU, GPU, memory (legacy and persistent) and other devices via a memory-semantic programming model.

The Gen-Z spec and working groups are evolving that standard, and early hardware is beginning to appear. However there is not an open "platform" on which to develop system software. The success of QEMU/IVSHMEM as an emulated development platform for The Machine suggests an extended use should be considered.

Beyond IVSHMEM - a rudimentary fabric

QEMU has another feature of interest in a multi-actor messaging environment like that of Gen-Z. By applying a slightly different stanza, the IVSHMEM virtual PCI device is enabled to send messages and handle interrupts in a "mailbox/doorbell" setup. An interrupt to the virtual PCI device is generated from an "event notification" issued to the QEMU process by a similarly configured peer QEMU. But how are these peers connected?

The scheme starts with a separate program delivered with QEMU, /usr/bin/ivshmem-server. ivshmem-server establishes a UNIX-domain socket and must be started before any properly-configured QEMU VMs. As a QEMU process starts, it connects to the socket and is given its own set of event channels, as well as being advised of all other peers. The idea is that each VM can issue messages through its PCI device which are delivered directly to another QEMU process as a Linux event. The target QEMU turns that event into a PCI interrupt for its guest OS. ivshmem-server only informs each QEMU of the other peers, it does not participate in further peer-to-peer communcation. A backing file must also be specified during this scenario to be used as a mailbox for larger messages.

alt text

The configuration stanza used for IVSHMEM + IVSHMSG:

-chardev socket,id=ZMSG,path=/tmp/ivshmsg_socket -device ivshmem-doorbell,chardev=ZMSG,vectors=4

or in a domain XML file,

  <qemu:commandline>
    <qemu:arg value='-chardev'/>
    <qemu:arg value='socket,id=ZMSG,path=/tmp/ivshmsg_socket'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='ivshmem-doorbell,chardev=ZMSG,vectors=4'/>
  </qemu:commandline>

Another program, ivshmem-client, is also delivered with QEMU. Using just ivshmem-server and multiple ivshmem-clients you can get a feel for the messaging abilities of ivshmem-clients. The final use case is actually VM-to-VM communication over the "IVSHMEM doorbell/mailbox fabric" which I call IVSHMSG. This requires handling the interrupts and mailbox usage in a coordinated fashion.

Potential Gen-Z Emulation

VM-to-VM communication will involve a (new) kernel driver and other abstractions to hide the mechanics of IVSHMSG. If the driver functions as a simple Gen-Z bridge, it is anticipated that a great deal of "pure Gen-Z" software development can be done on this simple platform. Certain Gen-Z primitive operations for discovery and crawlout would also be abetted by intelligence "in the fabric", ie, the ivshmem-server process.

Extending the existing C program is not a simple challenge as it is not a standalone program. Unfortunately ivshmem-server.c is written within the QEMU build framework. And C is a bit limited for higher-level data constructs anticipated for a Gen-Z emulation.

This FAME-Z project started out as a rewrite of ivshmem-server in Python using Twisted as the network-handling framework. famez_server.py is run in place of ivshmem-server. It correctly serves ivshmem-client as well as real QEMU processes.

alt text

FAME-Z is intended for simple device connectivity, ie, bridge-to-bridge with some simple switches. As with all emulations there will be a point at which the effort fails to reproduce the environment accurately. This experiment will find that breakdown point and determine whether the functional section is indeed sufficient to allow Gen-Z software development in the absence of hardware.

A new feature over ivshmem-server is that famez_server can receive messages from clients. Thus the famez_server can participate in fabric messaging and serve as fabric intelligence (ie, a switch). Again, the accuracy and validity of the project must lead to the creation of "pure Gen-Z" software above it, meaning it will run on real hardware someday without modification.

Running the Python rewrites

As the famez_server.py was created, there is also a famez_client.py to supplant the stock ivshmem-client. It has an expanded command set over the original. Over time its use as a monitor/debugger/injector will certainly grow. To see these packages function as a simple chat framework, you don't need QEMU.

  1. Clone this repo
  2. Install python3 package :twisted"
  3. In one terminal window run './famez_server.py --nClients 8'. This provides acceptance for six attached clients such as famez_client.c, ivshmem-client, or a properly-configured QEMU process. By default this creates /tmp/ivshmsg_socket to which clients attach, and /dev/shm/famez_mailbox which is shared among all clients for messaging.
  4. In a second (or more) terminal window run 'ivshmem-client -S /tmp/famez_socket'. You'll see them get added in the server log output.
  5. In one of the clients, hit return, then type "help". Play with sending messages to the other client(s) or the server.

Connecting VMs

While a QEMU process does the actual connection to the famez_server.py, it's the VM inside QEMU where the messaging endpoints take place. Building a QEMU image is beyond the scope of this project. The FAME project mentioned previously is a great place to accomplish that.

Before starting down this path, stop and restart the famez_server.py with the additional argument "--recyle". This extension over the stock server makes the QEMU members of the fabric a little more resilient when other peers die.

Then build and boot your VMs.

  1. Log in to each VM and git clone this repo.
  2. "cd famez_kernel/modular" and "make" which should create two kernel modules, "famez.ko" and "famez_bridge.ko"
  3. "sudo insmod famez.ko famez_verbose=2". dmesg output should indicate the driver found and attached to the IVSHMSG device, a "Redhat Emulation" card.
  4. sudo insmod famez_bridge.ko fzbridge_verbose=2. Again dmesg output should now show the driver bound to the famez driver. There should also be a new device file /dev/famez_bridge_xx where xx matches the PCI pseudevice address in lspci.
  5. Run a famez_client.py program and execute "list". This shows you participants and they're IVSHMSG ID.
  6. On a VM, echo "I:hello there" > /dev/famez_bridge_xx, where "I" is the IVHSHMSG client number of a client.
  7. On a different VM (the target of the echo command above) "cat < /dev/famez_bridge_xx" and you should see the message.

You can exercise the link a little harder with the programs in the "rocpyle" directory.

BUGS

As the QEMU docs say, "(IVSHMSG) is simple and fragile" and the driver modules are still under active development. Sometime a QEMU hangs and you have to restart it. Sometimes you have to restart all QEMUs. Rarely do you have to restart a server that was running --recycle, but it happens. The way I crafted an interlock protocol in the kernel drivers can cause a VM to go into RCU stall which usually leads to a virtual panic.

TODO

As mentioned earlier the idea is to write "up the stack" for Gen-Z support for the various entity "Managers" in the specification. So it's back to the spec which is beyond the scope of this document, but more will be revealed.

About

A Gen-Z fabric emulation environment built on QEMU and IVSHMEM.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 56.2%
  • Python 41.1%
  • Makefile 2.0%
  • Shell 0.7%