John Reagan Interview on LLVM, part 2.

This the second half of eCube’s interviews with VMS Software Inc.’s John Reagan, who heads the compiler group. We have been able to condense the interview into two blogs and this part of the interview expounds on the history of LLVM and reasons why the OpenVMS compilers use it on X86/64. We will continue on with the interview from the point where the last blog left off.

John Reagan, VMS Software, Inc.

History of LLVM and compiler technology

Q: We had some discussions earlier on the history of LLVM, and I have done a little research on LLVM, and it was started around 2000. You had some earlier information on this technology and I was hoping you could share that with us.

A; I think your timeframe is correct. It was a project by Chris Lattner at the University of Illinois at Champaign-Urbana. Digital invented GEM and other backend projects, but we never shared them. This was before the days of Open Sourcing. I looked at what Chris did with LLVM and I looked at his original Masters thesis, and while he did great work and I don’t think he cheated or stole it from anyone else, but some of what he invented is conceptually similar to GEM.  We had it in GEM already, and had it generally for 10 years, except we were never able to tell anyone about it.  Chris, however, did create a framework that the GEM backend did not have.

So to back up to answer your question before, I started with Digital in 1983 working on compilers and I worked in VMS compilers all the way to the point when HP transitioned VMS engineering to India and that engineering work went over there.  I moved to another area inside of HP, I worked for compilers both for HP/UX for a little bit but also compilers for the NonStop group and one of things we did was we ported NonStop to X86 and this was several years before what we’re doing now for VMS. We had a decision then on which back end technology would we use to generate X86 instructions and at the time the choices were you can go to Intel and license their back end for a hefty sum of money and probably some royalty structure or there is GCC which is the de facto large gorilla in the room in the open source community. It comes with the GPL license for better or for worse and it comes with a very vocal and active user community. There was another one called Open64.net which was what was chosen by NonStop but has since withered down because I think everybody that was there has switched over to LLVM. At the time it was very new and I think it only had it’s x86. So NonStop didn’t choose it but we had a chance to look at it. Even for me I said I like it because it has a really strong flavor of what I knew from GEM so it fit me from a personality point of view.

The other thing is I think LLVM and the LLVM foundation now have done a really good job to promote LLVM and it is actively used by Apple, Google, NVIDIA, Sony, Qualcomm, and many others.  Major players in the industry are now using LLVM. The version of Android on my Pixel 2 phone was built with the LLVM. It’s putting pressure in the GCC space they now have a real competitor. LLVM has great online documentation. The community is very helpful and reasonable. I’ve been to several of the LLVM  developer’s conferences and have been well received. It has been helpful to get VMS additional visibility. I gave a presentation at 2017 Fall LLVM conference on using LLVM for our compilers. They record of all the sessions at the conference. There is a YouTube channel where you go on YouTube and search. Just Google “youtube John Reagan LLVM” and you will find my five minute lightning talk where I squeeze everything in 5 minutes.  I apologize it’s cringe worthy watching me do it but it gives a good high level survey of what we’re doing and I’m trying to keep VMS in their minds; some of the older guys like us say: “Oh, yeah I remember VMS” but many of the new people don’t.  

Q: Thanks, that is useful. Well I noticed when I was researching LLVM that COBOL was not listed as a language.

A: That I don’t know – if someone’s put a COBOL frontend on LLVM; if not, we might be the first. I know no one to put a Basic frontend on it we will be the first for that. There have been Pascal frontends or Pascal-like languages. I think there have been FORTRANs on it before. As a matter of fact, .there is now a new FORTRAN compiler for LLVM and I love their naming scheme for it.  So if the C language  “clang” is the name of the frontend for LLVM and so what do you think the Fortran one should be named?

Q: It should be “flang”!

A:It is exactly “flang” and it’s an open source compiler from NVIDIA. It has a lot of DEC Fortran  extensions in it so one of the things we may provide in the future in addition to our own Fortran 90, 95 compiler for VMS; this is the one you’ve known for years that’s generating GEM and works with the GEM to LLVM. So if you want things from a newer Fortran standards you’ll have a newer Fortran compiler. You might have to give up some of your ancient DEC Fortran features if you have any RAD50 constants or something weird like that still in your Fortran program, you’ll have to get those out.

Also by using LLVM and Clang, (we talked about all the other languages, but what I haven’t talked about is what we’re doing for C++) –  the C++ compiler on the Alpha was a Digital written C++ compiler conformed to whatever C++ was at the time – there wasn’t even a standard at the time – this is early in the 90s. Then on the Itanium the C++ compiler is actually derived from the Intel C++ compiler and it is only C++03 standard compliant. People want even more standards support since C++ 03 there has been C++11, a C++14, and C++17. The committee is already working on C++20 so that’s a very active language and people want all these new features in those standards.  And so we need a better C++ compiler for VMS on X86. I can’t use the ancient Alpha one it doesn’t have anything in it that I want; the one on Itanium is one that we’ve licensed from Intel.  I don’t even have the ability to take that forward to a X86 if I want to so I said we need to C++ compiler – hey clang is a C++ compiler it’s highly compatible, it is  portable, it is standards compliant. So for the C++ compiler, we will take clang which is a linux-y front end and will add a DCL interface on it; we will add it will add a handful of new VMS features.  That’s exactly what we did on Itanium; we took the reference Intel  C++ compiler, added a DCL interface on it,   added dual size pointer support and a  bunch of other stuff: some of it easy, some of it may take a couple months to chew through but we will do exactly the same again for clang and so when we’re done you’ll have a C++ compiler on VMS that is the current language standards C++ compiler so it makes it easy for people to open source code that are using C or C++. And clang besides a C++ compiler, is also a C compiler.

So you can see if you have code that’s coming from some open source or some Linux box, by definition has no VMS is in it. No one is calling SYS$QIO on a Linux system today so you bring all the code over to VMS X86, you have our C++ compiler based on clang.  It will compile it just fine. So now that thing that kept you from moving things to VMS was I don’t have a modern C ++ compiler, now we will.

In addition, we will be bringing over a better standard template library. LLVM, besides having clang compiler has another project called libcxx which is a standard compliant open sourced with the very flexible LLVM license, so I get to go take that and will have a standard compliant  C++ 17 standard template library, and a C++ 17 compiler.  We’ll have to update some of the system headers probably have to put a little wax and polish on the CRTL that we have today

Q: That’s a fascinating approach. So, changing gears a little, how hard do you think it would be to have a COBLANG or a COBOL compiler?

A: Yeah well I certainly don’t think hooking on our COBOL compiler is going to be a problem for OpenVMS on x86 but I don’t know if I’m ever going to be able to open source our stuff that we inherited through HP.  

Q:  I am sure a good open source COBOL compiler would have a lot of traction. I mean there’s a few ones out there, Fujitsu got one, right?

A: There is OpenCOBOL; some people like it or don’t like it.  It is the same thing in the Pascal space; there is GPascal,  freepascal;  etc. Just from a fun perspective I’d like to open source the Bliss compiler; have Bliss everywhere because of course there’s really no monetary value to what’s left on the Bliss frontend. There’s nothing in there that patentable; has no market value; there’s no corporate secrets in there it’s a boring compiler that parses tokens and move bits around and generates an intermediate language just like every other compiler you’ll find on the internet. Yeah but what I just saying was technically don’t own it so I technically can’t open source it.

Q: Bliss?  I thought it was open source.

A: No, it’s still got Digital or HP copyrights on it.  I would like us to make some agreement with HP to let us do some of these things-  I think it would help I think help some of our customers do that that be something that I would you don’t ask them the VSI management to take up with the HP management.

Q:  I just I just installed Bliss on our VMS systems back in our office to test our new Eclipse editor for NXTware Remote, and there was no product license so I wasn’t sure.

A: That’s right it doesn’t require a license pack, but it’s not open source..

Q:  OK. Let’s move on. –  I am curious about the VAX architecture being handled by the compilers and we talked about this earlier. I have always known the VAX to be a stack machine. My question is that MACRO-11 has a lot of instructions for the stack – is this emulated for new architectures like X86?

A:  So there is the MACRO-32 compiler for the Alpha and Itanium machines, and VMS is still – well at one point in the history of VAX VMS, the whole operating system was written in MACRO-32. There was some Bliss, but the majority was MACRO-32. Over the years, it has changed a little. Right now it is mostly Bliss and C, but I think about a third is still MACRO-32. The MACRO compiler on Alpha reads the VAX instructions from the source file and turns them into Alpha instructions. It is relatively straightforward, you have a VAX ADDL3 instruction –which is add 3 things together, whether they are memory locations, registers or whatever and the compiler magically does the right thing on all those different architectures. We are writing another MACRO-32 compiler for X86; same thing generally, but it goes in a little different interface than the rest of the compilers – we still use LLVM to generate our object files, but we have to do some of the code generation ourselves. MACRO-32 has the peculiar thing that a lot of MACRO-32 code knows how to jump from the middle of one routine into the middle of another routine and talk about the same registers. And you can’t reasonably represent that in some compiler Intermediate representation. You can’t represent that in LLVM – it is hard to talk about that – you can write a C program that does a go to from routine A to routine B. You can do setjmp and longjmp and work it that way but that is high overhead and that doesn’t get you what you want. But MACRO-32 does that all the time: come into the entry point of A, save some registers, jump into B, jump into C, optionally jump into D and go out the bottom of some other routine. The epilog of those routines have to know what registers to put back the way it should. So that is something we have to into a lower level of LLVM, but they have an assembler interface in LLVM as opposed to a higher level language interface. So we use that.

GEM had the same thing. It had the GEM Intermediate Representation that everybody used except for MACRO-32 – there was a back door interface that GEM used on Alpha and Itanium. LLVM not surprisingly has a similar kind of interface.

Q: So LLVM IR is not used on MACRO-32?

A: No. Unlike the other front ends, which don’t care about their target, the MACRO-32 compiler, a portion of it does cares about the target and a portion of it doesn’t. The parser for MACRO-32 is a parser. The optimizer for MACRO-32 is an optimizer. Since MACRO-32 has to emulate a VAX and you know VAX has condition codes but Alpha doesn’t and Itanium doesn’t. Hey, X86 has condition codes, but they aren’t quite the same as the VAX condition codes. On VAX just about every instruction sets the condition code; on X86  only a subset of them sets the condition codes, but we are leveraging that to get reasonable quality code for X86. But every architecture is a little different; and MACRO-32 is a little different for each target. So it looks like a VAX to you; you do a PUSHL and you do push things on the stack, you do a CALLS and the magic happens. We do the right thing on each target, but it is different on every target.

Editor: This concludes the interview with John Reagan on the subject of LLVM for OpenVMS X86.  

Leave a Reply