- Home
- Solutions
- Customer Spotlight
- Featured Customers
- Data.gov
- King County
- Kenya
- Cook County
- United Nations Development Programme
- City of Edmonton
- Medicare
- New York City
- State of Oregon
- State of Oklahoma
- City of Seattle
- City of Chicago
- City of New Orleans
- MetroChicagoData.Com
- City of Baltimore
- City of Austin
- SAMHSA
- City of San Francisco
- Discover
- Company
- Newsroom
We’re looking to hire our first engineer focused on ops automation. This is primarily a software engineering role, although to be successful you’ll need to have roots in and a passion for systems administration. At blist systems engineers develop software that automates as much of operations as possible. The goals are fully lights out operation via automated systems deployment, imaging, monitoring, error detection and healing. To qualify you’ll need to meet most of the following prerequisites:
* Ability to design and develop production quality automation code, preferably in python but at minimum in perl
* Solid understanding and experience xith PXE boot based automated image deployment
* Good familiarity with Xen or other virtualization software
* Good familiarity with systems monitoring software (Nagios, etc.)
* Broad Unix/Linux systems administration experience
* Some experience in database administration is helpful, but not mandatory
* Passion for distributed computing
blist is a well capitalized startup developing database as a service, operated at Internet scale. We’re solving some really interesting challenges and have a terrific team of passionate engineers. If you are interested in joining us, send your resume along with the solution to the following challenge:
Assume you have a network comprised of 1,000 servers in 10 different data centers – 100 servers in each data center. The data centers are in multiple time zones. Write a centralized script that runs on the Linux PC in your office, which identifies the server in the network which was most recently rebooted. The output should identify the server, the data center it’s in, the date & time when it was last rebooted and how long the script ran in order to find the results. Your script must finish in less than 5 minutes (300 seconds). At any time 2% or 3% of the servers will be offline.
Your solution should include:
* A description of your assumptions about network topology – how your Linux PC connects to each server.
* Any other base assumptions you make about the servers in the network.
* A description and, if appropriate, the layout of any configuration file(s) you’ll need to solve the problem.
* Commercial quality perl or python code that solves the problem and prints the results.
If ops automation at Internet scale is your passion, drop us a note. We’d love to hear from you.
Archives
- May 2012
- April 2012
- March 2012
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- March 2011
- January 2011
- December 2010
- October 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- September 2006



