Using AWS EC2 instances for large builds

I experimented a few years ago with using EC2 spot instances (virtual server on the internet, but using unused server capacity). It was fairly successful, being able to run large calculations that should have taken weeks in a matter of days.

Since I started at my current job I’ve been running into building increasingly complex yocto images which keeps growing in size, at this point most images I build can take up to 6-7 hours to build on my laptop. This is an i7-4558U 2.8GHz cpu and 8 gigs of RAM so it’s not bad really, just not enough for these types of builds.

Again I started experimenting and I am really happy and impressed. So far all experiments are for open source projects etc, so nothing that has any non-disclosure agreements or corporate etc etc, I’d like to but this isn’t up to me really. I’ve setup an AMI on EC2 which I can instantiate and have up and running in 2-3 minutes, and then mount a 100 gig EBS volume where I store the sources and build data.

The same build that generally takes up to 6 hours on my laptop takes approximately 30-40 minutes on an EC2 c4.4xlarge machine (16 cores and 32 gigs ram).

My key findings so far is:

  1. Keep an AMI with all the build tools necessary/or script the setup.
  2. Keep an EBS volume with the long term stored data, gits etc for building and mount somewhere suitable in the AMI.
  3. Keep a micro instance (these are free/very cheap) around for mounting the EBS volume when you just want to check stuff out, mess around etc but not make builds.

Iptables-tutorial and ipsysctl-tutorial on github

I guess I should have done this a long long time ago. Both the iptables-tutorial and the ipsysctl-tutorial source code are now available on github. Many many years ago I had an idea of putting the version control system out there for use, but I never realized it for various reasons. Both these documents are old by today, but the basic information is still valid and will likely be for a long time to come it seems.

I apologize for the version history, I moved from CVS in a rather rude way to SVN without keeping the history, which was what I used back in those days.

I invite anyone and everyone to do edits if they wish to and send me pull requests to fix the issues they find, or to add the documentation they’d like to add.

The iptables tutorial is available at:
https://github.com/frznlogic/iptables-tutorial

The ipsysctl tutorial is available at:
https://github.com/frznlogic/ipsysctl-tutorial

Project build speed importance

I began writing this several years ago but never published it for various reasons. I think some of the thoughts are still really interesting, I joined a company named Pelagicore some months ago and we do some rather large builds at this company, similar in build times in the project I refer to in this text, but with some major differences. This build is based on yocto and it actually works really well once you get through the first build (currently a scratch dev image build is up to 4 hours in time, but after that it’s able to rebuild only changed software and recipes as necessary. The scratch build is heavy admittedly, but we are working on some ways to improve that as well, such as sstate caches, icecc distributed build systems, and so forth. I think with modifications we could get those 4 hours down to under an hour, which would be really sweet. Anyways, I thought I’d post this text as is, even though it’s not finished, just because and since I re-read it right now 😉 .

Original text


Again, I’m struck by the importance of a proper build process in projects. No matter how small or big the project is, the build must be kept fast and working from get go to the end — which means its end of life cycle. I am currently working in a large project with a huge code base where the build system is, in my opinion, next to completely collapsed. Build time is in excess of 1-4 hours, and the automated test-suites take from 3 hours to 7 hours to complete, depending on how thorough a test-suite you choose. A gigantic heap of time is spent just…. waiting… waiting…. waiting. Just to make a simple one line edit and then compile to see if it compiles can take up to 4 hours. I’ve previously been a bit spoiled with good and simple build systems, only occasionally running into really crappy build systems — for some reason, a lot of scientific open source software seem to be stuck in this category.

This time, I think I hit pay-dirt on “how not to do it”. Instead of focusing on the bad parts, I will try to focus on the things to do and to keep in mind. Nobody seems to like a grumpy person anyways, which I really am sometimes.

1. Keep the build system simple and manageable. Try to maintain the build system in a logical fashion and in a single language/system (scons/python, Makefiles, etc).
2. Expandable (new directories, files, etc)
3. Scalable (multiple CPU’s, machines, threads, etc)
4. Try to use as few frontends as possible (a single top makefile for example, with targets depending on what you want to do), and keep people informed of changes. Wiki with history via RSS is _perfect_ for this. This doesn’t mean you can create a script to build something, then 2 months down the road delete it because it is no longer needed, it causes a lot of stress for developers who just barely had a chance to find the script.
5. The programmers are (most likely) your customers.

FMEA of Door and conclusions

October 13, 2011 by · Leave a Comment
Filed under: Development, General, Management 
If Cavemen had done an FMEA of the door when it was invented, this would possibly be the outcome.
Item Potential Failure mode Potential Effects of failure Severity Rating Potential Causes Occurence rating
Open door Person behind door gets door in face  – Broken neck
– Death
– Bleeding nose
5-6  – Two persons trying to pass the same door
– Door placed to open into corridor
– -“- onto esplanade
– Badly placed painting on back of door
4
Open door Hidden vicious monster on other side  – Death
– Loss of limb
– Loss of head
10  – Door opens into wilderness
– Glass not yet invented, so other side of door can not be inspected
4
Close door Limbs get stuck between door and doorway  – Loss of limb
– Broken fingers/toes
– broken nails
– Bloodshed
7  – Person is angry and slams door shut
– Annoying salesman sticks foot into door
– Forgot person behind you
5
Run through door Other side opens into chasm
Other side opens up on multistory building
Other side opens up into ocean
 – Falling off high cliff
– Death
– Drowning
– Bludgeoning
– Cuts and bruises
10  – Earth activity causes earthquake and resulting chasm on other side of door
– Balcony fell down
– Global warming caused ocean to rise
– Person was in a rush, running through the door, falling off.
2

Conclusion

If the cavemen would have been engineers, they would never have invented the door, or at least not used it.

Bugs, bugs and more bugs

December 12, 2010 by · Leave a Comment
Filed under: Development, Linux, Projects, Ubuntu 

Lately, I’ve come to realize more and more that bug handling in open source, and specifically in Ubuntu has dramatically declined in efficiency. For years I’ve been extremely satisfied with using Linux because it’s bug free, there has simply not been any serious bugs that I’ve run into. In the last weeks, I’ve run into several more or less serious bugs in Ubuntu, which got me looking at how the bug handling is done.

First off, a few weeks ago, I ran into a bug with Ubuntu 10.10 Ubiquity (the Live CD installer) where I accidentally marked my old /home drive as ext4 when it was ext3 (but not to reformat it). The installer complied happily, and set it up as ext4, but once it got back online, the harddrive was completely wiped. No warning, no nothing. I started looking around, after a while I’ve found several reports on the same matter on launchpad.  For example this and this.

This lead me to take a look at Ubiquity’s other bugs in launchpad, and it’s not very promising. The main installer of Ubuntu 10.10 has 1528 Open bugs as of writing this, of which 846 bugs are new, 35 bugs are marked High importance — and the bugs I found (dare I say, they seem Critical to me, are still not marked with any importance at all). Only 12 bugs are marked as having a patch.

Fine, maybe this is not the poster child of open source. However, the last few days I’ve been severely annoyed by the password popup which is misbehaving. I enter the password, and hit enter (or hit the Authenticate button) and the password field disappears, but the rest of the dialog stays up, and nothing works in it. The only thing you can do is to kill it with the x button. When you do this, you get authenticated…

Since I’m not sure exactly how the authentication is performed in Ubuntu for the update manager etc, I decided to check the update-manager package for Ubuntu on Launchpad. What do I see, if not another package with gigantic mass of bugs filed, but noone dealing with them. 1017 Open bugs, 520 of those are New and 15 marked as High importance. This bug I’ve been having has been reported all over the net, but noone seems to be dealing with it and it isn’t really reported in launchpad. Some computers has it, some doesn’t. It’s nowhere near a critical bug, or even a high importance one, but it’s annoying none the less and it looks extremely crude and comes off giving a fairly unstable feeling.

All this being said, I am wondering how bug handling is done, and how it should be managed on “aggregate” projects such as Debian and Ubuntu. I think the idea is really nice, having upstream bug trackers for each package in the project, but maybe we are spreading too thin having several bug trackers for each minor project? Also, how do we as “normal” users know which package is the reason for the error? I am not so sure it is really the update-manager that is the error in this case, it might as well be some completely other thing behind all that dbus stuff etc. Ie, what is the point of me filing bug reports if I’m not sure they wind up in the right place, or are at all looked after?

Unit testing and stubbing singletons

May 4, 2010 by · Leave a Comment
Filed under: Development, Projects 

I got a bit curious about stubbing Singletons for testing during the weekend as well. We often find ourselves needing to test large codebases at work, and in the current project I’m in, we do complete end to end signal flow tests, but people are finally realizing that this will simply not do. For this reason, we’re doing a lot of work to try to split the entire project up into manageable chunks. One of the main problems has been the incessant use of singletons. A simply half-way-there to doing full out interfaces is to simply make all public function calls virtual and then create a stub class of the singleton saving the message or whatever passed on, into a variable which can be grabbed and tested from the actual unit test.

A sample of the general idea below:

#include 

class A
{
public:
    static A *instance()
    {
        std::cout << "A::instance()" << std::endl;
        if (!s_instance)
            s_instance = new A;
        return s_instance;
    };

    A()
    {
        std::cout << "A::A()" << std::endl;
    };
    // Virtual makes the difference
    virtual void send(int i)
    {
        std::cout << "A::send()" << std::endl;
        // Blah blah, send i or something
    };
    static A *s_instance;
private:
};

class stub_A: public A
{
public:
    static stub_A *instance()
    {
        std::cout << "stub_A::instance()" << std::endl;
        if (!s_instance)
        {
            s_instance = new stub_A;
            A::s_instance = s_instance;
        }
        return s_instance;
    };

    stub_A()
    {
        std::cout << "stub_A::stub_A()" << std::endl;
    };

    void send(int i)
    {
        std::cout << "stub_A::send()" << std::endl;
        y = i;
    };

    int getMessage()
    {
        return y;
    };
private:
    int y;
    static stub_A *s_instance;
};

A *A::s_instance = 0;
stub_A *stub_A::s_instance = 0;


int main(int argc, char **argv)
{
    stub_A::instance()->send(5);
    std::cout << "stub_A::instance()->getMessage() == " << 
        stub_A::instance()->getMessage() << std::endl;

    A::instance()->send(7);
    std::cout << "stub_A::instance()->getMessage() == " << 
        stub_A::instance()->getMessage() << std::endl;
}

Inproductive productivity

For a while I’ve been stuck in slow speed mode again, not really doing great work, just being on average. It feels weird. Don’t really get much done, but I have on the other hand had a great deal of time to test some “new” technologies, well, new as in only 10-15 years old I guess :-). I’ll get back to that later. Also, I’ve begun a new contract at “a big company”.

This is my first time at a really giant hunk of a company, the biggest I’ve seen before was circa 500 people in all, and it moved slower (the beaurocracy) than this in all honesty. This BigCompany is quite interesting to me. Started off with almost 4 weeks of introductions, courses, and so forth. They have a dedicated TEAM of CM’s, that alone is just… wow :-P. I’ve just been put up to speed and started working a little before this weekend so I might be a bit premature, but I like it so far. The weird part is, things happen, but not as I’m used to it. I’m used to 13+hour days and frentic coding/hacking to get things to happen, everyone here eschews away with their 8 hour days — only working overtime at very special occasions — yet slowly things get done, new functionality gets added and so forth.

Another thing that kind of amazes me — and worries me to some extent — is the kind of planning that is done. I’m used to small scale projects with workpackages or task based development, where no workpackage should ever take more than 4-5 days to implement. This place uses a workpackage development structure where each package takes up to 6-7 weeks for 6-10 people to implement. We’ll see how it works out — at least their “stand-up meetings” works :-).

All that being said, I had the time to write quite a bit of python which is a first, then I’ve looked into d-bus architecture which is also a first, and I also looked into Bluetooth and how to use it — some test applications running, fetching services and graphically displaying info about all units it finds etc. The complexity of Bluetooth is rather saddening imho, it’s a horrible protocolstack to work with in some senses, even though I was really impressed by how much python does for you.

I’ve been unused to the whole concept of python before this, and just a tad sceptical. Mainly because of all the problems with version matching that you always wind up having to do, to make anything work properly (try getting scons, trac and wamp, and some more tools working on a win32 machine some day for some fun).

Anyways, I always figured there has to be an upside, and there really is — python is hackfriendly 🙂 . In less than 3-4 hours I went from writing my first simple helloworld to having a scratch written class based graphical (tkinter) interface implementing some very fundamental bluetooth commands. In my world, thats not bad at all ;).

I’ve also had time to learn a lot of new tools at work. I’ll comment on those some other day as I havent seen much other comments on some of them (some is imho very expensive crap with a nice wrappings, while some are completely awesome). Sidenote, I simply adore the systems we are working on 4 xeon with 4cores and 64 gig ram.

I’ll get back later :-).

Work? What work?

January 27, 2009 by · 1 Comment
Filed under: Configuration Management, Linux 

So, just a brief update. I’ve recently (a few months back *cough*) taken over our Linux “education” group at work, and it’s interesting. The sad part is, we mostly only see people who already knows what Linux is as we’re working internally in a world where most people are rather Computer savvy as it is. It’s given me a few new viewing angles though, and I’ll get back to that at a later point.

Currently working on some Trac guidelines for our Change Management process as well. Working from home today to actually get something done with it, as most of the days I wind up getting too many disturbing calls, talks and discussions to be very efficient. Our first two tries at making a decent workflow winded up a bit messy, and I think we really must get this down properly this time.

There are some other things I react on, and want to fix, for example, as it looks now, every single project sets up their own bugtracking/ticketsystem, and every project uses a different system (trac, mantis, clearcase, dimensions, etc). Preferably, this should be centralized in some fashion, and if possible I’d love to get a bit more homogenized environment. As it is, I try to tell people “look, here’s a system for handling your day to day tasks, use it!”. First time, the workflow got overly complex, second shot was also overly complex, and people where put off by all the choices and steps to take. This problem mainly stems from project/change management criterion.

My latest and greatest (yeah yeah) workflow should alleviate some of these problems by making some of the choices less visible to normal users. Ie, we have one task management system and a problem and change management system baked into one, but normal users (programmers) only use the task management system, while the project manager, tech project manager and CM also have the ability to handle problems and changes in separate workflows.

We’re also adding the ability to have supertickets, where a single problem report can contain several tasks. This is a pseudo development so far, as we’re not actually adding the whole deal right now, we’re just adding the idea of it, not bounds checking or views/reports of it. Basically, every ticket can have a superticket (we add a numeric field to the ticket), which can point to another ticket, which is the “parent” ticket. This makes it possible to handle a large and complex bug in several smaller tickets. Anyways, the idea is there, but it’s not fully implemented. If our management likes it, and the others like it, we could implement it for future usage. I’m worried it’s too complex however. At the same time, one complex system might be better than 6 alltogether different systems as it allows for longer time to learn? Kind of like… well, unix for example. Once you find ls, its a darn good bit faster than having to click your way through a whole heap of paths to find the specified file list.

At the same time, both me and PM are a bit tired of Trac’s shortcomings, maybe change to Mantis for example? My general thought to this however is, we need to stick it out i’m afraid… one more system will just make the normal user less interested in the new tool and hence taking even longer to learn. As it is, people use it at a bare minimum cause they dont know it, give them time to learn it properly, and they might come to like it. Comments on this line of thinking?

For now, tata. Back to writing.

Build components

November 23, 2008 by · Leave a Comment
Filed under: Configuration Management, Development 

After a weekend of work, I finally got myself a build component that I’m semi-pleased with, for C and C++ projects, using Subversion. Most likely works for any other lower level programming language as well.

First off, structure. Each component is it’s own BTT(Branches, Tags, Trunk)-root, residing in a Project_Modules directory in subversion. Each component contains an inc, src, test and a stubs directory. Rationale for the BTT-root is that, with a separate BTT-root for each component we can raise the version of each separate component without having to raise it for the entire project.

The Project directory resides on the same level as Project_Modules, and is empty, only containing the subversion property externals pointing to the trunks of the Components in Project_Modules. Rationale for this is to have a simple place to checkout the entire project. It’s a bit dangerous when working with branches, and requires a little bit extra care so one doesnt write into the trunk out of mistake. Possibly block everyone but a specific user to write in the trunks and have that CM person do all the branching/merging. It is time consuming however.

It looks something like this:

  • Project_Modules
    • Component1
      • inc
      • src
      • test
      • stubs
    • Component2
      • inc
      • src
      • test
      • stubs
  • Project

Inc directory is the public interface of the component towards the other components. Src directory contains the actual code of the component. Test contains unit tests (personally, i create a new directory for each new unit test file). Stubs contains the stubs of my own component. Ie, Component1/stubs will contain stubs for the functions in Component1. Rationale being that 95% of the time, we want to stub another component in the same way, instead of keeping stubs of a component in 10 different components, we keep it in one place.

“New” subversion structure using svn:externals

Me and the boss deviced a new structure for the project during the last few weeks, and it’s been slowly refining in our heads until yesterday when we finally implemented it. I think we made a rather refined and complex structure, but once we got it into place physically and once we get the general idea into the developers heads (including me), I think it will prove very powerful.

That being said, I don’t think this is a new structure, I just think people are very quiet about how they use subversion, and it’s a problem. Newcomers do the same old errors over and over again. So, let’s get on to try and explain it all.

Most projects uses a single BTT root, where BTT stands for Branches, Tags and Trunk. Ie, they start a project, and then straight in the root put the BTT, and then inside that, they create the project structure. For example:

  • project-root/
    • Branches
    • Tags
    • Trunk
      • admin
      • src
      • out
      • test

This is a good basic structure for very small projects, containing perhaps 10’ish files, or where the actual implementation is perfectly homogenous and has no need for separated versioning. Every time we want to make a release, we cheap copy the content to Tags as a new tag (called perhaps /Tags/Milestone1-RC1). We now have a release that we can provide to people.

The problem comes if it isn’t so homogenous. For example, let’s say you are developing a calculator. It has two objects, a numpad and a display. What if you want to make a new version just of the display? You need to make a completely new version, including for the numpad.
Or how about wanting to branch just a small part of the project? Ie, I want to use a branch for the numpad, and then use the trunk for the display. You’d then have to make a cheap copy for the entire tree. Admittedly, it isn’t costing too much.

Our “new” structure deals with this on a different level. Basically, the idea is to have multiple BTT roots, and then use svn:externals to connect the correct tags to create
1) a complete releasable project and
2) a complete workarea project.

For the calculator example, you get the following structure:

  • calculator/
    • Calculator_Modules/
      • Display/
        • Branches/
        • Tags/
        • Trunk/
      • Numpad/
        • Branches/
        • Tags/
        • Trunk/
    • Calculator/
      • Branches/
      • Tags/
      • Trunk/

As you can see, it looks much more complex, and it is, but the possibilities are infinitely much better.

The Calculator/Trunk/ directory contains a svn:externals property linking in the Calculator_Modules/Display/Trunk as Display and Calculator_Modules/Numpad/Trunk as Numpad. This works by linking external resources into the current directory structure, so basically I would get the trunks into my Calculator trunk, but properly renamed, without them actually being there in the repository. This also works on “real externals” by the way, such as linking in a specific version of a library from some repository on the Internet.

To create a Calculator/Tags/MS1 we could either just set a -rXX to the correct subversion revision, or we would create svn:externals to the correct Display and Numpad Tags, not their trunk. This way, we can say that “Calculator 1.0 contains Display 2.0 and Numpad 2.1”, not “Calculator 1.0 contains Display revision 439 and Numpad revision 587”, or even worse “Calculator 1.0 is revision 587” which completely lacks granularity.

I’m not completely sure it’s perfect, and others have probably already tested it, but I think it will be pretty sweet :-).

Next Page »