Major changes in Open SoC Debug ahead!

Over the last roughly two years, Open SoC Debug has grown into a reliable debugging tool for the needs of lowRISC and OpTiMSoC. A lot of effort went into fixing small bugs to improve reliability and to add some features such as the emulated UART device, UART-DEM. And it was worth the effort, as we’ve seen over the summer when we added Linux support to OpTiMSoC. Control flow traces generated by the CTM modules, as well as the UART-DEM module were major enablers for this work.

This work has also given us a better understanding of areas where our current design limits extensibility. Furthermore, we found some parts of the reference implementation to be tricky or fragile to use. To fix all this, we’ve started brainstorming and refactoring the spec and the reference implementation.

What changes are coming?

Debug Interconnect Addresses are now 16 bit wide

Previously, addresses used to identify debug modules were 10 bit wide, allowing up to 1024 modules to be present in a debug system. We’re extending addresses to 16 bit, addressing up to 65536 debug modules. (“65k should be enough for everybody.”) This change also requires modifications to the packet format the on-chip interconnect, and hence modifications to all debug modules which send and receive packets. 16 bit addresses enable us to address more debug modules, but they also enable us to reserve some parts of the address for special purposes. (More on that later.)

Changes to the base register map

All debug modules in OSD conform to a common base register map. These registers describe the type of the module, its version, amongst other things. In order to be more extensible, we’ve split the module type into two fields (vendor and type identifier) and rearranged them in the register map. The specification is already updated to describe the new register map.

Cocotb-based hardware testing/verification

Currently the hardware portion of the OSD reference implementation is mainly tested using manual tests, together with system-level tests in OpTiMSoC and lowRISC. To make changes to the code base easier and the results more predictable, we’re adding unit tests to the reference implementation using the excellent Python-based cocotb unit testing framework.

A full rewrite of the software reference implementation

The software running on the host is the main entry point for users to OSD. It must be as robust as possible to give a smooth debugging and tracing experience. But it also must be extensible to add new debug tools easily.

The current implementation has a couple of very nice properties:

Multiple debug tools can consume data coming from the target. For example, run-control debugging with GDB doesn’t interfere with logging a system trace.
A scriptable interface makes it easy to automate the interaction with OSD-enabled SoCs.
Debug tools can be separated into individual processes, or combined into one process.

We’ll keep these properties, but extend them in a couple ways:

Instead of using our own TCP-based communication protocol between the debug tools on the host, we’ll rely on ZeroMQ. Using ZeroMQ is a great as it solves a couple problems at once:
- It supports different types of transports. Components connected by ZeroMQ can live in the same process using the inproc shared memory transport, but they can also live on different machines using the tcp transport. All of that is fully transparent to the application.
- ZeroMQ has bindings for just about any programming language out there. This enables writing debug tools in all those languages, as long as the host communication protocol is adhered to (something we are also documenting in more detail as part of the refactoring process).
- Finally, ZeroMQ is great at handling all the tiny little details of communication over unreliable links – connects and disconnects, timeouts, signal handling, and much more.
We’re redesigning the architecture libopensocdebug to be easier testable. This mostly involves splitting the current matroshka-doll-like structure (a debug tools encapsulates the tool client component which encapsulates the tool server component and ultimately GLIP for communication) into smaller classes. For example, GLIP is no longer encapsulated in libopensocdebug, but it remains separate and the two libraries are connected on a higher level. A more detailed look at the new architecture will follow in a later blog post.
We’re adding unit tests and code coverage metrics to make sure we’re not breaking things when extending our implementation in the future.

One more thing: Subnets

Something which is mostly in our head right now are OSD Subnets. For now this only means: all debug tools on the host are part of one “subnet”, and all debug modules on the target device are another subnet. There’s more to it, but we’ll keep that for a later time.

So where’s the code?

A large rework like the one we’re currently attempting involves changing code in various places. Unfortunately, it’s not always possible to completely decouple the dependencies between these changes. This is especially true for the changes to the communication protocol, which require changes to both software and hardware parts.

So to keep OSD working and usable by downstream projects while we’re working on this major refactoring, we’ve decided to take the following approach:

We’ll keep the master branches of the reference implementation in a working, stable state. All our refactoring will happen on a different branch, called osd-next. If you’re currently using OSD in your designs (that’s most likely only true for OpTiMSoC and lowRISC), stay on the master branch for now.
The specification is continuously updated to reflect the current state of our thinking, i.e. how we want the spec to be. This, in turn, implies that the reference implementation temporarily diverges from the spec. But since there has been no formal release of the spec anyways, we feel that approach shouldn’t cause too many problems for our users.
We’ll try to review and merge individual chunks of work making up the rewrite as usual, and commit them into the respective osd-next branches.
The new software reference implementation is currently in the progress of being cleaned up for an initial review round. It still has a lot of rough edges, but already exhibits the properties we’re looking for, and I hope no major redesign will be needed before it can get merged. Expect a first pull request in the coming weeks.

In addition to the upstream work at OSD, OpTiMSoC is maintaining an continuously updated branch osd-rework in its repository. This ensures that the changes in OSD fit well into the needs of our downstream projects.

Give feedback and get involved

Changes like the ones we’re attempting here present an excellent opportunity to get involved. Let us know on the mailing list what you think, or what your questions are.

by Philipp Wagner
on 21 September 2017