Madefire Press

Blog

Toolboxes

Posted by Dan ,
general, hacking | Permalink | No Comments »

Craftsmen

If you look at any craftsman who has been working for a while, they gather a toolbox they take with them anywhere. For the physical world, it’s a real toolbox with all the dings, scratches, and well-used tools to go along with it. For software developers, it’s a virtual toolbox of scripts, code snippets, and useful notes they can use to improve their workflow and make getting to problem-solving easier.

At Madefire, we’re starting to build a toolbox of our own. As developers, we recognize the contribution to our increased productivity that open source software (OSS) provides us. While we’re usually focused on our mission, we do commit patches back to OSS projects we use such as AFNetworking.

Madefire’s Toolbox

In recognizing that we want to contribute back as much as possible, we’ve published our first bit of open source code in the Madefire toolbox repo. We intend for this to be a collection of anything we create that we find useful. We use a diverse set of languages and tools to create our Motion Books and readers, so this repository won’t be limited to just iOS or web tools; however our first submission is something for iOS.

UIStoryboard+DeviceNamedFiles

In the Cocoa Touch framework’s UIImage class, one can use the +imageNamed: method, which has some useful behavior that helps get the correct image based on device and screen scale. For example, calling [UIImage imageNamed:@"image"] will return image@2x~iphone.png for iPhone retina, and image~ipad.png for iPad non-retina devices. This is a huge time saver in the amount of code one needs to write to cover all devices you might run on.

For UIStoryboard files, the developer has to specify if they are for iPad or iPhone at creation. This leads to having two storyboard files for each interface idiom that you support. In your code you might have to select between the two, leading to something like this:


UIStoryboard *storyboard;
if (UI_USER_INTERFACE_IDIOM() == UIUserInterfaceIdiomPad)
storyboard = [UIStoryboard storyboardWithName:@"iPadStoryboardFilename" bundle:nil];
} else {
storyboard = [UIStoryboard storyboardWithName:@"iPhoneStoryboardFilename" bundle:nil];
}

As practiced software developers know, that kind of code branching and repetition is something to avoid. That’s why we came up with the UIStoryboard+DeviceNamedFiles category. It consists of one method, +mf_deviceStoryboardWithName:bundle:. You can use it much like UIImage‘s +imageNamed: method:


UIStoryboard *storyboard = [UIStoryboard mf_deviceStoryboardWithName:@"StoryboardFilename" bundle:nil];

If you’ve created your storyboard files as StoryboardFilename~ipad.storyboard and StoryboardFilename~iphone.storyboard, then the +mf_deviceStoryboardWithName:bundle: call will return the correct one for each device. If you have a storyboard file without a device specifier (e.g., StoryboardFilename.storyboard) and no device specific file is found, it will return that. If it can’t find a storyboard file to return, nil is returned just like +storyboardWithName:bundle:.

Update: I have a bug report with Apple about including this functionality. You can see bug #12753707 at OpenRadar.

Keeping sane with a build system

Posted by Dan ,
general | Permalink | 7 Comments »

Any software developer can tell you what a nightmare it can be to make sure two consecutive builds are the same. A lot of pulled hair and late nights have gone before us as a testament to having a sane and uniform build system that is responsible for churning out builds the same way each time. As a small team we have been slowly improving on this aspect of things. Here’s how we’ve added a bit of sanity to the process of developing our iOS app at Madefire.

Jenkins

There are a lot of continuous integration and build systems out there. We’ve decided to go with Jenkins as our system. While we have the iOS app we also have other facets of our technology stack that need testing, including server side and web tools. Jenkins helps us make sure we’re not locked in to just a single platform for everything, since it will run on OS X, Linux, and Windows if needed. The ability for Jenkins to have a master and many slaves that we can add as needed is also a big help. If we ever find ourselves in a place where our current builds are backed up we will just need to add another machine to the pool.

One of the absolute greatest strengths of Jenkins is the plugin system. Many people have published their hard work in extending Jenkins as plugins. The plugins we use all the time are:

Without these we’d have to write our own scripts to handle quite a lot of functionality.

What We Build

We have a slew of build jobs on Jenkins, one for the web services, one for authoring scripts, and four for our iOS builds. For the rest of this post we’re going to focus on the iOS builds we have:

  • • Dev — every time we push to master on Github this starts a build, mostly as a sanity check to make sure nothing breaks long term.
  • • AppStore — when we’re ready for the App Store submission process we build here and then use the result to test and submit.
  • • Testflight — when we need an AdHoc build to share with our registered test devices we start this build and it pushes the final product to Testflight on success.
  • • Enterprise — we also have an Enterprise iOS account for all of our test devices so that we don’t take too many device slots on our regular portal. Like the Testflight build, this pushes the product to Testflight on success.

The Build Machine

For iOS development we’re limited to a Mac as our build machine. For our purposes we put a headless Mac mini (don’t forget to turn on screen sharing so you can get back to it) in the office and let it do its thing. We got the base model with 2GB of RAM. If we need more we can expand later, but for now this is more than enough machine for this use. As usual, we installed Xcode on it and made sure git was usable. We created a builder user whose only job is to run Jenkins as a client and build things.

The world of iOS development requires code signing in order to get an app onto a device. In order to make this work, we exported our certificates and profiles, including the distribution certificates, and installed them for the builder user. We haven’t yet got a system to keep the build machine up to date with provisioning profiles, but on the list to look at is cupertino from @mattt.

Configuring Xcode Projects

We want to build repeatably without worrying about the configuration of our local machines every time. We did this via a combination of Xcode Build Configurations and Schemes. Providing a Scheme per build type allows you to have pre- and post-build commands that are tailored to each type of build (e.g., changing push configurations based on the build type). We also use build configurations to tailor the build settings to each build type. For example, for each build configuration we can have a specific code signing parameter setup so that we don’t have to worry about which profile and certificate will get used. Other things that can be changed per Build Configuration (and with preprocessing the Info.plist file) are the display name of the app and bundle identifier.

Xcode Build Configurations

Xcode Build Configs for the Project

Xcode Code Signing Parameters

Xcode Code Signing Parameters for the Target

Once we had the Build Configs defined we created a Scheme for each one. This step may be overkill for your project. You may be able to get away with a Dev, AdHoc, and AppStore Scheme trio and then just flip Build Configurations to get what you need. Once everything is setup in Xcode and you can build it and get the results you expect, it’s time to make some Jenkins jobs.

This is Xcode. Jenkins Knows Xcode.

As mentioned above, we have a Dev build that builds every time we push to the master branch on Github. We don’t need nearly as much churn as that for our AppStore, Testflight, and Enterprise builds. In fact, since two-thirds of those builds are uploading to Testflight, for these we want to only start builds manually to avoid too many versions being too confusing on Testflight. For the App Store we only want that when we’re ready to submit, never any other time. In the Xcode plugin for Jenkins there are many fields to consider. From our Xcode configuration work above we have the needed values for Target, Xcode Schema File, and Configuration. The next step is that we want to change the Marketing version, or in Info.plist parlance the CFShortBundleVersionString. Originally, we were going to change the Technical Version or CFBundleVersion but Apple requires that to be a number and we use it for content version compatibility checks. In our Info.plist we will leave the value for CFBundleVersion alone at whatever our latest release to the App Store version is. All builds that are for non-engineer use should come from Jenkins.

For the Marketing version field we put a prefix like "TF-" for Testflight and "E-" for Enterprise and the Jenkins variable for the build number. When we look at a build we’ll be able to know which build Scheme and Configuration was used and when it was built via Jenkins. This is set at build time and doesn’t need to be commited to our repo every time we want to build. For our App Store version we have a small script that runs before the build to increment the CFBundleVersion/Technical version, commit that back to the repository, and push it to the origin. This allows us to only have the build number increment when Jenkins builds an App Store build, which is manually triggered.

Further down in the Xcode plugin options is Build IPA?. We do indeed want an IPA generated for us so we can upload it to Testflight or store it on S3 for later retrieval. In addition the plugin will zip up the dSYM and upload it so your crashes can be symbolicated. The Embedded Profile setting will need to point to a copy of the provisioning profile. You’ll have to have your profiles available to Xcode to code sign. In addition we keep a copy in the builder users home directory (or at least a symlink to the Xcode location) for easy access.

A word of warning on the Xcode Plugin distributed by default for Jenkins: there is a known bug that only surfaces if you don’t set the Technical version manually. It is noted in the Github pull request #9 for the Xcode plugin. Once we’re done with our release cycle we’ll be looking into contributing effort to help get a new release of the plugin out, but until then we’ve built a version of the Xcode plugin that contains the patch from the pull request. It is available in the downloads of the Madefire fork of the Xcode plugin.

Previously, we had installed the needed code signing certificates in our Keychain. Thankfully, the Xcode plugin can unlock your keychain for you. We tick the Unlock Keychain? option and then fill in Keychain path and Keychain password. The default login keychain location is ${HOME}/Library/Keychains/login.keychain. The password is the same as the builder user’s login password. Now Xcode can code sign your builds until heat death of the universe (extra-ordinary conditions excepted).

Artifacts For Digital Archeology

At the bottom of the Jenkins job configuration page there is a button that says Add post-build action. We have three post-build actions:Upload artifacts to S3, Upload to Testflight, and Git Publisher.

S3

The S3 Plugin is a little different for most Jenkins plugins in that it actually has a global configuration in the main Jenkens management screen. We configured one S3 profile to upload to and then returned to the jobs. For each one, we selected the configured profile and told it what files to upload with the Source entry. This field must be an Ant style glob, so we put something like **build/Product Name-Configuration-*.*. That will grab both the IPA and dSYM zip file. The Destination bucket we grabbed from the S3 interface. We decided for now to configure only one bucket for all builds to go into. We can change that later if we need to.

Testflight

For Testflight uploads the configuration is fairly straightforward. The API Token is your Testflight user’s personal key (we’ll make a builder user on Testflight so these automated builds don’t all look like they’re coming from one person) and the Team Token is key for the team where builds will be uploaded. For the IPA File path we choose ${WORKSPACE}/build/Product Name-Configuration-Marketing version.ipa. This is different from the default output by the Xcode plugin (see the note about the fix above) that would be ${WORKSPACE}/build/Product Name-Configuration-Technical version.ipa. Having the same technical version until we run an AppStore build means we would potentially have a lot of overlapping files named the same thing, but containing different builds. To rectify this, we rename from the Technical version file name to the Marketing version file name in a shell script after the Xcode build has occurred. Please remember that each Xcode project will be different and the values will need to be customized to it, those are not variables in Jenkins.

Our Build output directory is set to ${WORKSPACE}/build so that the build products are local to the Jenkins workspace and easily accessible. The dSYM file is named similarly to the IPA as ${WORKSPACE}/build/Product Name-Configuration-Marketing version-dSYM.zip (and we have to rename it the same as the IPA above). For the Build Notes we put a fairly generic string of "Uploaded from Jenkins. (Technical version ${BUILD_ID})". The ${BUILD_ID} variable puts the date and time just in case. That message will help us find uploaded builds and when we’re ready to send that Testflight build out we’ll change the build notes.

Tagging

We decided that we wanted to keep track of where every build for Testflight, Enterprise, and AppStore was in our tree. The Git plugin has a post build step called Git Publisher. With it we can tag the git repo with a tag like "REL_Technical version". That tag is then pushed back to our Github repo and we can step back through the tree if we need to. We worried about too many tags, but since we’re manually building these we shouldn’t have too many of them to worry about. If we were to do this tagging step for builds kicked off when code is pushed, it would likely be just noise, but for just our distributed builds it will be a bigger help down the road when we need to look at the state of the tree for a build.

Final Thoughts

Now we have builds that are generated by a stable non-developers’ machine. They are uploaded to S3 and Testflight automatically for us, removing another manual step from the process (for the AppStore builds we skip the Testflight upload). For all Jenkins jobs we have it email our engineering email list if there is a failure, and again when it returns to healthy. It’s almost so easy that we can forget it’s there and just let it work for us, as computers are supposed to do.

There are a few things that would make Jenkins configuration easier. In order to get all of this working we created a single job and made changes to the configuration until it worked. We then created new jobs that copied the working job and changed the few parameters that needed to be unique. It would be great to have common configuration things like Github and Testflight that could be centrally managed for all the redundant parts. The only per-job configuration would be items like filenames of artifacts that are specific to a job. We’d also like to see more variables from the Token Macro plugin. Getting the built product’s version string (e.g., 1.0) anywhere would be great. Possibly the Xcode plugin can be modified to export those values like the Git plugin does for the repository. That is something we’ll have to investigate later. Overall, Jenkins is a great tool to use. Anyone from small to large companies should be doing continuous integration and using a tool like Jenkins to have consistent builds for release. It will save you time down the road, and maybe save your bacon in a pinch.

Lies, Damned Lies, and Statistics

Posted by Ross ,
operations | Permalink | 1 Comment »

The Value of Numbers

I’m pretty sure whoever coined the phrase Lies, Damned Lies, and Statistics wasn’t referring to “stats” as used in running production websites and web services.

When it comes to running large scale and highly available systems, stats are your best friend. When things are going well, collecting and graphing performance data can help you to monitor and plan for growth. Once problems arise, it can mean the difference between finding and immediately addressing the underlying issue, or randomly poking and praying.

Push is Your Friend

Push notifications drive engagement. We’re early in to things here at Madefire, but we’ve already been sold on the value of push notifications. Our primary use is the announcement of new titles, and the nearly immediate bumps we get are exciting to watch and make us glad that we’re built to scale.

A push notification causes a large spike in hourly visits and continues on for over 24 hours.

A push notification causes a large spike in hourly visits and continues on for over 24 hours.

Increasing Users Increases Requests

As would be expected, adding a substantial bump in concurrent users causes a corresponding rise in the number of requests we see in the API. We always make sure someone is around to watch things for a while after we send a notification, to ensure everything runs smoothly. What they watch is the subject of this post.

The large spike in the number of API requests made following a push notification.

The large spike in the number of API requests following a push notification.

We make extensive use of Graphite and Statsd for collecting and viewing stats. We feel it’s essential for all of the data to live in a single place. You can never know ahead of time when you’ll need to see two pieces of data on the same graph in order to quickly (remember: the system is down or under-performing) find a non-obvious root cause. Graphite and Statsd do exactly this. The time elapsed between deciding to track something and having data on a graph is measured in minutes.

Tracking Numbers Across the Stack

We track things at several levels in order to be able to find problems that span them or at least where the cause and effect does.

A hypothetical example:

  • * An alarm is raised, the load-balancer just started taking twice as long to respond to requests…
  • * Check the requests per second, nothing out of the ordinary there, there’s a small bump, but that may be a result of clients timing out and retrying.
  • * Look at the distribution of requests, nothing out of the ordinary there.
  • * What about the database? Oh there’s a spike in load at exactly the same time the response time jumped up. That’s looking promising.
  • * Since we know the API is serving a normal amount and distribution of requests, we’ll assume the DB is too for now. It may not be and we’ll need graphs for that too, but it’s less likely.
  • * Let’s look at the DB host’s detailed numbers, IO_milliseconds is way up, maybe a drive acting up.
  • * Look closer, log on to the box, promote a slave to be the new master.

In this made up scenario we started at the load-balancer and worked our way down to slow IO on a DB’s drive. In doing so we started at service-wide stats via the load-balancer, ELB in our case which we monitor with a script that polls CloudWatch data and passes it along to Statsd using the Statsd client library. The next step took us to the distribution of requests. For that we make use of the Django-Statsd app. After that we went and took a look at system-level metrics which Diamond has done a great job of collecting for us, though in our case we actually make use of RDS so we wouldn’t be seeing disk/io stats for the DB.

That’s just one pass through our stats. We collect a decent set of information in our backend code that isn’t directly related to requests, and we’ve written and contributed back a Diamond plugin to monitor Memcached instances. Our stats DB records each time code is deployed, a box is added to or removed from our systems, or any number of other operational actions. It’s important to be able to trace problems back to human actions. We even have a monitoring script that checks for AWS outage events via the regions API and throws that data in the mix.

All of this is to say that you never know what it is going to be useful to track until you need it. Your best bet is to have a system where it’s so easy to track things that you just do it with most everything you can think of, that way it’ll be waiting on you when you need it.

Information Overload

All of the information in the world isn’t going to help you track down a problem at four in the morning if it’s not organized. In fact, being bombarded with numbers or having to dig through things then is only going to hurt.

That’s where dashboards come in. As you get to know your setup, you should build out collections of graphs that give you an overview of the system as a whole which allows you to quickly drill down to an area where you need more information. We start with a main dashboard with a set of graphs that let you look at the overall health of things. From there we have a set of focused dashboards that show you more detailed information about pieces of the system or host-level metrics for groups of boxes.

A view of several graphs on one of our dashboards.

A one-hour window of one of our host-level data dashboards. These are our miscellaneous hosts. They do odd jobs: gateway server; log collection; background workers; and the stats host itself.

That’s Not Normal

The final piece of the process is knowing what things should look like. Once you’ve built your dashboards, this part couldn’t be easier. You just need to look at them all the time. Then you’ll know what to expect things to look like and be able to quickly spot when something is out of the ordinary. It may not always be THE problem, but it’ll almost always point you in the right direction.

iOS App Store update window — users can download an old version

Posted by Dan ,
general | Permalink | No Comments »

Recently we released version 1.03 of the Madefire app. Our initial release of  the Madefire app was limited to the iPad 2 and new iPad using the UIRequiredDeviceCapabilities functionality provided by Apple. With the new release we removed one of the restrictions to allow the app to run on the iPad 1 (in addition to a lot of memory and performance tuning).

We released the app via iTunes Connect and waited for it to process to the App Store. As soon as it did we grabbed an iPad 1 and downloaded it onto the device. Only what we got wasn’t what we expected.

In the App Store we saw the version listed as 1.03 and the requirements “Compatible with iPad.” meaning that we were no longer restricting iPad 1 installs. After running the app on the iPad 1 we noticed things that should not have been happening on an iPad 1. For a moment we thought that we had messed up somehow, that the version we released wasn’t working like the test build. We had a brief moment of panic.

That’s when it dawned on us to check the version number in app. Surprisingly to us it was the previous release, 1.02. In a small window between when the App Store had been refreshed and when we downloaded the app onto our iPad 1 we were able to install the old version that had UIRequiredDeviceCapabilities prohibiting it from being installed.

We were a little concerned with this as we didn’t want any customers downloading the old version on an iPad 1 where performance was less that ideal. Returning to the App Store on the device showed that an update was available for Madefire, so our fears were set aside as any normal users that had happened to download in that brief window of time would also see an update eventually.

It’s odd that there is a chance for a new user to download an old version of your app on update, but what’s even more concerning is that the old version of the app installs and runs even even though it has the device capabilities restrictions on it. It’s something to keep an eye on for sure when dealing with support issues.

iPad Video Wall for Comic-Con

Posted by Ross ,
hacking | Permalink | 12 Comments »

Being an effective engineering team is often about making the right trade-offs. When our CEO proposed a 21 iPad video wall for Comic-Con San Diego everyone was excited at the ridiculousness of it. The initial excitement was tempered a bit by the timeline we were working with. With less than two weeks before the convention and plenty of “normal” work to do we we’d have to go the quick and dirty route. Once the wall was at the convention the result was going to be on display for thousands of people, ten hours a day, for three or more days. Flaky and unreliable was not an option.

18 iPads being configured

18 iPads fresh from the Apple Store. 3 more would join the next day. We’re setting up the iPads, assembly line style. We configured the first, backed it up to iCloud, and restored that to the rest. There was still a bit of manual configuration to do, but it saved a lot of work.

“Real” solutions would involve tightly synchronized clocks or timecode. NTP is made for that purpose, but in the walled garden of iOS apps it wouldn’t be that simple. We’d have to maintain a clock/time in our app that synchronized to a central server frequently and then instruct the app to play a particular video at a specific time. Another option was to provide a syncing signal via the headphone jacks of the devices, but that involved hardware, something we didn’t really have the time to try. In the end, we decided on a simpler approach.

Our initial implementation used a multi-threaded server which connected to each client. The manager had a list of movies to play and the duration of those movies. It would loop through the videos, quickly sending out commands telling the clients to start playing, wait for the video to finish playing and then tell the clients what to play next. If the connections were quick and reliable enough it should work. Thankfully the human vision system is rather forgiving.

Once the iPads were configured we pushed our proof of concept app to all of them and started it up.

The truth is that it worked pretty well. The one change we ended up needing to make was to allow for the time it takes an iPad to load the video before playing. That was addressed by making it a two-step process. We first tell the the clients to prepare and then after waiting a sufficient amount of time, tell them to play the video. It’s not perfect. We’ll see a bit of skew on an iPad every once in a while, but you generally have to be looking for it and even then it’s not objectionable.

Our first content was a set of individually crafted videos with a counter on each screen so that we could detect drift. That’s not very interesting. Now we needed to cut up some video content to fit the wall. To do this we coded up a python script that crops a version of the source video for each iPad to play using avconv/ffmpeg.

There was a lot of drawing and annotating pictures involved with getting the slicing right. We went back and forth quite a bit trying to decide whether we should fit all of the source pixels in to the output videos or allow for the borders by dropping the pixels that fall behind them. The iPad has a huge surround, over 20% of the size of the screen itself. In the end we decided to go with dropping the pixels as there was too much distortion of the content when there’s a physical jump of 1¾” between rows.

At this point we had a working iPad video wall, but it wasn’t particularly fault tolerant, at least not enough to stand up to 10 hour days in a crowded (in terms of RF traffic) convention hall. The lack of time took away a lot of our options here. We thought about trying to connect the iPads over USB, maybe even using MIDI, but the route was a non-starter. Unfortunately USB hubs that can power multiple iPads are hard to come by so we’d lose the ability to charge the iPads while playing.

Once wired connections were ruled out we were left with the option of making the wireless connections as robust as possible. Stable is one thing, but they will never be fool-proof. To handle hiccups we spent some time making all of the code more robust. If an iPad drops out it will immediately start trying to reconnect so that it’ll be ready for the next round. There were several potential failure-cases and we had to look closely at each of them to ensure the iPad would retry as expected without beating the server to death. It’s not perfect, but good enough for a couple days of hack-a-thon style work.

A test run of the 7×3 setup on a table in our office.

We’re pretty proud of the results. If you’re at Comic-Con stop by our booth, #4902, and check it out. We have some great signings scheduled and you’ll be able to check out our app on our demo iPads. Otherwise we plan to open source the results once we’ve recovered from the event and had a little bit of time to clean up the code. Once we do the iPad Video Wall Code will live on github.

Footage of the wall in our booth at Comic-Con.

Get the RSS feed

Sign-up for our newsletter