More

mikeurbach · 2026-02-22T19:35:50 1771788950

I gave a short talk about compiling PyTorch to Verilog at Latte '22. Back then we were just looking at a simple dot product operation, but the approach could theoretically scale up to whole models.

https://capra.cs.cornell.edu/latte22/paper/2.pdf

https://www.youtube.com/watch?v=QxwZpYfD60g

mikeurbach · on Dec 27, 2023

Disclaimer: I work on Chisel and CIRCT, and these opinions are my own.

These are good points, and I think Chisel is actually improving in these areas recently. Chisel is now built on top of the CIRCT[1] compiler infrastructure, which uses MLIR[2] and allows capturing much more information than just RTL in the intermediate representations of the design. This has several benefits.

Regarding the problem of converting from HDL to System Verilog, and associating the tool outputs to your inputs: a ton of effort has gone into CIRCT to ensure its output is decently readable by humans _and_ has good PPA with popular backend tools. There is always room for improvement here, and new features are coming to Chisel in the form of intrinsics and new constructs to give designers fine grained control over the output.

On top of this, a new debug[3] intermediate representation now exists in CIRCT, which associates constructs in your source HDL with the intermediate representation of the design as it is optimized and lowered to System Verilog. Think of it like a source map that allows you to jump back and forth between the final System Verilog and the source HDL. New tooling to aid in verification and other domains is being built on top of this.

Besides this, the combination of Chisel and CIRCT offers a unique solution to a deeper problem than dealing with minor annoyances in System Verilog: capturing design intent beyond the RTL. New features have been added to Chisel to capture higher-level system descriptions, and new intermediate representations have been added to CIRCT to maintain this information and its association to the design. For example, you could add information about bus interfaces directly in Chisel, and have a single source of truth generate both the RTL and other collateral like IP-XACT. As the design evolves, the collateral stays up to date with the RTL. I gave a talk[4] at a CIRCT open design meeting that goes into more detail about what's possible here.

[1] https://circt.llvm.org/

[2] https://mlir.llvm.org/

[3] https://circt.llvm.org/docs/Dialects/Debug/

[4] https://sifive.zoom.us/rec/share/MhHtXPg_7iZk-QWw0A66CaBJDGs...

erichocean · on Dec 27, 2023

Thank you for this write up.

gchadwick · on Dec 27, 2023

Thanks for the info these all certainly sound like promising developments though I still think there's a major hurdles to overcome.

> good PPA with popular backend tools

Getting good PPA for any given thing you can express in the language is only part of the problem. The other aspect is how easy does the language make it to express the thing you need to get the best PPA (discussed in example below)?

> Think of it like a source map that allows you to jump back and forth between the final System Verilog and the source HDL.

This definitely sounds useful (I wish synthesis tools did something similar!) but again it's only part of the puzzle here. It's all very well to identify the part of the HDL that relates to some physical part of the circuit but how easy is it to go from that to working out how to manipulate the HDL such that you get the physical circuit you want?

As a small illustrative example here's a commit for a timing fix I did recently: https://github.com/lowRISC/opentitan/commit/1fc57d2c550f2027.... It's for a specialised CPU for asymmetric crypto. It has a call stack that's accessible via a register (actually a general stack but typical used for return addresses for function calls). The register file looks to see if you're accessing the stack register, in which case it redirects your access to an internal stack structure and when reading returns the top of the stack. If you're not accessing the stack it just reads directly from the register file as usual.

The problem comes (as it often does in CPU design) in error handling. When an error occurs you want to stop the stack push/pop from happening (there's multiple error categories and one instruction could trigger several of them, see the documentation: https://opentitan.org/book/hw/ip/otbn/index.html for details). Whether you observed an error or not was factored into the are you doing a stack push or pop calculation and in turn factored into the mux that chose between data from the top of the stack and data from the register file. The error calculation is complex and comes later on in the cycle, so factoring it into the mux was not good as it made the register file data turn up too late. The solution, once the issue was identified, was simple, separate the logic deciding whether action itself should occur (effectively the flop enables for the logic making up the stack) from the logic calculating whether or not we had a stack or register access (which is based purely on the register index being accessed). The read mux then uses the stack or register access calculation without the 'action actually occurs' logic and the timing problem is fixed.

To get to this fix you have two things to deal with, first taking the identified timing path and choosing a sensible point to target for optimization and second actually being able to do the optimization. Simply having a mapping saying this gate relates to this source line only gets you so far, especially if you've got abstractions in your language such that a single source line can generate complex structures. You need to be able to easily understand how all those source lines relate to one another to create the path to choose where to optimise something.

Then there's the optimization itself, pretty trivial in this case as it was isolated to the register file which already had separate logic to determine whether we were actually going to take the action vs determine if we were accessing the stack register or a normal register. Because of SystemVerilog's lack of powerful abstractions making a tweak to get the read mux to use the earlier signal was easy to do but how does that work when you've got more powerful abstractions that deal with all the muxing for you in cases like this and the tool is producing the mux select signal for you. How about where the issue isn't isolated to a single module and spread around (e.g. see another fix I did https://github.com/lowRISC/opentitan/commit/f6913b422c0fb82d... which again boils down to separating the 'this action is happening' from the 'this action could happen' logic and using it appropriately in different places).

I haven't spend much time looking at Chisel so it may be there's answers to this but if it gives you powerful abstractions you end up having to think harder to connect those abstractions to the physical circuit result. A tool telling you gate X was ultimately produced by source line Y is useful but doesn't give you everything you need.

> the combination of Chisel and CIRCT offers a unique solution to a deeper problem than dealing with minor annoyances in System Verilog: capturing design intent beyond the RTL > you could add information about bus interfaces directly in Chisel, and have a single source of truth generate both the RTL and other collateral like IP-XACT.

Your example here certainly sounds useful but to me at least falls into the bucket of annoying and tedious tasks that won't radically alter how you design nor the final quality and speed of development. Sure if you need to generate IP-XACT for literally thousands of variations of some piece of IP this kind of things is essential but practically you have far fewer variations you actually want to work with and the manual work required is annoying busy work that will generate some issues but you can deal with it. Then for the thousand of variations case the good old pile o' python doing auto-generation can work.

Certainly having a solution based upon a well designed language with a sound type system sounds great and I'll happily have it but not if this means things like timing fixes and ECOs become a whole lot harder.

Thanks for the link to the video I'll check it out.

Maybe I should make one of my new year's resolution to finally get around to looking at Chisel and CIRCT more deeply! Could even have a crack at toy HDL in the form of the fixed SystemVerilog with a decent type system solution I proposed above using CIRCT as an IR...

seldridge · on Dec 27, 2023

> Could even have a crack at toy HDL in the form of the fixed SystemVerilog with a decent type system solution I proposed above using CIRCT as an IR...

This is the exact type of activity that CIRCT is trying to make easier! There are both enough core hardware dialects that new languages (generator-style embedded domain specific languages or actual languages) can be quickly built as well as the flexibility of MLIR to define _new_ dialects that represent the constructs and type system of the language you are trying to build while still inter-operating with or lowering to existing dialects.

This was the kind of thing that didn't work well with Chisel's FIRRTL IR as it was very closely coupled to Chisel and it's opinions. Now FIRRTL is just another CIRCT dialect and, even if you're not using Chisel and FIRRTL, you're benefitting from the shared development of the core hardware dialects and SystemVerilog emission that Chisel designs rely on.

mikeurbach · on Dec 27, 2023

> To get to this fix you have two things to deal with, first taking the identified timing path and choosing a sensible point to target for optimization and second actually being able to do the optimization.

> Because of SystemVerilog's lack of powerful abstractions making a tweak to get the read mux to use the earlier signal was easy to do but how does that work when you've got more powerful abstractions that deal with all the muxing for you in cases like this and the tool is producing the mux select signal for you.

Thanks for the example and illustrating a real world change. In this specific case, Chisel provides several kinds of Mux primitives[1], which CIRCT tries to emit in the form you'd expect, and I think Chisel/CIRCT would admit a similarly simple solution.

That said, there are other pain points here where Chisel's higher-level abstractions make it hard to get the gates you want, or make a simple change when you know how you want the gates to be different. A complaint we hear from users is the lack of a direct way to express complex logic in enable signals to flops. Definitely something we can improve, and the result will probably be new primitive constructs in Chisel that are lower-level and map more directly to the System Verilog backend tools expect. This is one example of what I was alluding to in my previous reply about new primitives in Chisel.

> Your example here certainly sounds useful but to me at least falls into the bucket of annoying and tedious tasks that won't radically alter how you design nor the final quality and speed of development.

I guess it depends on your goals. I spoke[2] about CIRCT and the new features in this realm at Latch-Up 2023, and after the talk people from different companies seemed very excited about this. For example, someone from a large semiconductor company was complaining about how brittle it is to maintain all their physical constraints when RTL changes.

> Maybe I should make one of my new year's resolution to finally get around to looking at Chisel and CIRCT more deeply!

We'd love to hear any feedback!

> Could even have a crack at toy HDL in the form of the fixed SystemVerilog with a decent type system solution I proposed above using CIRCT as an IR...

That's exactly what the CIRCT community is hoping to foster. If you're serious about diving in, I'd recommend swinging by a CIRCT open design meeting. The link is at the top of the CIRCT webpage. These can be very informal, and we love to hear from people interested in using CIRCT to push hardware description forward.

[1] https://www.chisel-lang.org/docs/explanations/muxes-and-inpu...

[2] https://www.youtube.com/watch?v=w_W0_Z3n9PA

mikeurbach · on Nov 8, 2023

Since you brought up Sasa Juric, I will second that and also mention their book Elixir in Action. It really helped me get from toy examples to feeling confident running the BEAM in production. This is of course Elixir-centric, but the parts about OTP, inspecting running applications, etc. are really about the BEAM.

bmitc · on Nov 9, 2023

Yes, that book is excellent, and I definitely recommend it to anyone new to Elixir. I'd also recommend Joe Armstrong's Erlang book, even if one is eventually wanting to use Elixir.

mikeurbach · on March 25, 2022

I used to go very deep on philosophical discussions about the nature of sandwiches with my friends. I’m still digesting the cube rule, but here’s what we came up with:

Every sandwich has an axis. The axis is through the filling. You can roughly divide all sandwiches based on whether you eat along the axis (what we called axially), or around it (what we called radially).

A burrito is consumed axially, while a crunchwrap supreme is consumed radially.

Hope this is interesting, we’ve found pretty much every sandwich can be classified as radial or axial, which seemed like an achievement.

Groxx · on March 25, 2022

So what would a breaded thing count as? Like a chicken or fish stick? It wasn't "filled" in any direction. At the extreme you could have a breaded, fried donut hole - a sphere that had bread deposited onto it (i.e. the donut hole is the filling of the sandwich), no axis at any stage, and you eat it from all directions equally.

mikeurbach · on Nov 28, 2021

We had the pleasure of hosting Dr. Manohar at a CIRCT weekly discussion session earlier this year. He presented much more recent work if anyone is interested. The talk and discussion was recorded here: https://sifive.zoom.us/rec/play/Bg99_niHh9OG_8uE_nhaz6otxvA0...

EDIT: talk begins around 7 minutes.

mikeurbach · on July 6, 2021

I contribute to CIRCT, so I feel like I should chime in here. I personally hope that it can provide exactly the kind of unifying IRs we are all hoping for in the open-source community. The fact that the tools are implemented in C++ may be a win for some, but I think the CIRCT project is compelling for much deeper reasons. The README states the ambition clearly:

> By working together, we hope that we can build a new center of gravity to draw contributions from the small (but enthusiastic!) community of people who work on open hardware tooling.

There are weekly community meetings that are open to the public, and we have guest speakers from all sorts of interesting projects in the open-source community. Many of those are leading to collaborations and contributions to CIRCT.

There hasn't been much (any?) discussion of CIRCT on HN, but rather than present the reasons I think it's so great here, I'll point to a talk[1] I gave earlier this year and a much better talk[2] Chris Lattner gave shortly thereafter, both of which lead up to the "Why CIRCT?" question in the second half.

Looking back at that SymbiFlow thread, I see familiar faces that are now actively contributing to CIRCT. There are mentions of many different hardware IRs in some of the posts, but at least three have first-class support in CIRCT today: FIRRTL[3], LLHD[4], and Calyx[5]. This is all very recent and experimental, but I would say the results are already promising.

[1] https://slideslive.com/38955645/applying-circuit-ir-compiler...

[2] https://www.youtube.com/watch?v=4HgShra-KnY

[3] https://circt.llvm.org/docs/Dialects/FIRRTL/

[4] https://circt.llvm.org/docs/Dialects/LLHD/

[5] https://circt.llvm.org/docs/Dialects/Calyx/

mikeurbach · on May 27, 2021

I've been enjoying placing a drop right along the divide and seeing which way it goes. For example, up near Lenawee Mountain (aka A-basin). I've also noticed that up there, sometimes there is no route to either ocean. I guess it ends up in a high-alpine catchment and stays on the divide.

mikeurbach · on May 7, 2020

There was some interesting discussion between the LLHD[1] and MLIR[2] folks about just this topic recently[3]. My takeaway: modeling behavioral HDL semantics in the IR is a huge mess that has to account for all of the complexity in the existing HDL landscape. However, there is hope for modeling structural HDL semantics in an IR that could be the target for many languages.

You mentioned LLVM, and I think of MLIR as sort of a successor to LLVM. I'm hopeful that a low-level HDL IR is standardized as an MLIR dialect, so the compiler ecosystem and the hardware ecosystem can start to unite.

[1] https://news.ycombinator.com/item?id=22825107 [2] https://news.ycombinator.com/item?id=22429107 [3] https://drive.google.com/file/d/1x7B0IRdcJ5JBQvfHbPUBcShbTFC...

ancharm · on May 7, 2020

I love the idea of LLHD. Also LNAST. There are some great ideas here.

https://github.com/masc-ucsc/livehd

mikeurbach · on April 10, 2020

I find it fascinating this research and MLIR[1] seem to have been developed independently around the same time. Figure 1 tells the same story in both papers. The authors mention the possibility of representing their concepts in MLIR, which could be really interesting.

[1] https://arxiv.org/pdf/2002.11054.pdf

mikeurbach · on Sept 13, 2019

When I read the headline and saw the source, I assumed this would be about GraphQL. I know Instagram utilizes GraphQL, for example on the web client, so now I'm wondering how that fits in.