Planfile
This planfile documents the various steps towards the initial release of Ampify. If you'd like to help make it a reality, do join us on the #esp channel on irc.freenode.net. [ Table of Contents ]
Table of Contents
— Thanks, tav <tav@espians.com>
CoffeeScript
Ampify will support custom services written in CoffeeScript syntax. This is seen as a better alternative to offering JavaScript directly. For example, compare:
listBrowsers = (browsers) ->
list = $('<ul />')
for [id, name, url] in browsers
$("""<li id="#{id}"><a href="#{url}">#{name}</a></li>""").appendTo list
return list
With the resulting JavaScript:
var listBrowsers;
listBrowsers = function(browsers) {
var _a, _b, _c, _d, id, list, name, url;
list = $('<ul />');
_b = browsers;
for (_a = 0, _d = _b.length; _a < _d; _a++) {
_c = _b[_a];
id = _c[0];
name = _c[1];
url = _c[2];
$("<li id=\"" + (id) + "\"><a href=\"" + (url) + "\">" + (name) + "</a></li>").appendTo(list);
}
return list;
};
Overview of the Architecture
The Ampify network is built up of independent instances (Hosts) and a minimal registry at amphub.org. The vision is that, eventually, every individual would run their own Host.
AmpHub
AmpHub is the result of a grand compromise between the Ministry of Decentralisation and the Ministry of Usability.
Ampify Zero
The Ampify network is built up of independent instances (Hosts) and a minimal registry at amphub.org. The vision is that, eventually, every individual would run their own Host.
A Host can be run from a single machine, e.g. on a mobile device or on someone's laptop, or it can be deployed across multiple servers — even across multiple datacenters.
Hosts are issued with a unique ID number when they first register their public key with amphub. They can later use their private key to authenticate themselves to other Hosts and use their control key to make updates at the amphub registry regarding their:
- current location (domain name/ip address + port)
- public key
- X.509 certificate for HTTPS frontends
So, for host 23 to talk to host 51, it'd first ask amphub for Host 51's current location, key and certificate. It'd then use that information to establish and verify a connection to Host 51.
Users are also similarly issued with a unique ID number when they first register their public key with amphub. They can later use their control key to make updates at amphub regarding their:
- current Host's ID number
- public key
So, for user 42 to send a message to user 140, user 42's Host would first look up the current Host for user 140, and then the current location for the Host before sending the message to it.
It's important to note that in Ampify, unlike existing systems like email and many “decentralised web” initiatives, the user's ID isn't tied down to any specific Host.
This important distinction means that users have the freedom to move Hosts without inconveniences like telling everyone that they now have a new ID at a different Host.
This feature will also allow for some of the funkier Ampify 0.x functionality, e.g. being able to specify multiple current Hosts for a user — so that data can be accessible even when users are offline!
The role of amphub will also become less prominent during the development of Ampify 0.x. The introduction of the Amp Routing Protocol will allow for Host location and user Host updates to happen in a completely peer-to-peer manner.
This will leave amphub to simply act as a registry of “dumb numbers” to public keys. And various measures will be put in place to counter any denial-of-service attacks (on both legal and technical fronts).
In contrast to Ampify 0.x which will use IPv6 (using a Teredo- like service when native IPv6 isn't available), connections in Zero happens over IPv4.
Similarly, in contrast to using a UDP-based transport protocol with LEDBAT-like congestion control and SPDY-like framing with TLS-like crypto using OpenPGP-like certificates, Zero uses traditional HTTPS connections.
The location of a Host in Ampify Zero is defined as a set of either of the following pairs:
- FQDN (absolute domain name) + port
- IP address + port
It is expected that Hosts running on user devices will tend to use raw IP addresses, whereas Hosts running off servers will tend to use absolute domain names.
Since most “home users” will be behind NAT devices, Zero supports port mapping and NAT traversal using either UPnP IGD or NAT-PMP. If those don't work, it'd be up to the Host admins to manually configure their routers.
Ampify 0.x will have more comprehensive NAT traversal support and this will be tightly integrated with its use of tunneling when native IPv6 connectivity isn't available.
Ampify Zero frontends bind to an HTTPS server at port 8040 by default. Host admins are expected to manually configure any LVS or similar proxies/load balancers to map port 443 to this port if they want to maximise connectivity.
Admins can take advantage of a number of both deployment and remote monitoring script hooks to simplify any such topology related management.
Once a connection gets through, it is handled by nginx (the HTTPS frontend) which serves 3 purposes:
- Serve “root” static files, e.g. robots.txt, ampzero.js, etc.
- Proxy requests to ampnode instances
- Serve error pages, e.g. 50x when ampnode instances are down
In contrast to Ampify 0.x, where requests will be dispatched to a request specific “nodule” app (which may even be compiled dynamically) on an appropriate internal Host node, the Ampify Zero design is super simple.
Nginx will proxy requests to an ampnode instance running on the same server. These instances will be homogeneous across a Host. That is, every single instance will be able to handle the exact same set of requests.
While this doesn't take advantage of locality in the way that 0.x would, it certainly makes for a simpler design. The ampnode instance is basically a combined event-driven and multi-threaded Python web server.
These instances are much like “app servers” and can be used in a manner similar to “modern” web app frameworks like Rails and Django. Hosts can define their own services to complement the built-in ones which all Ampnodes are expected to provide.
This is quite feeble in contrast to the intended LXC + Native Client sandboxed capability to run any arbitrary application code in Ampify 0.x, but it's also a lot simpler ;p
On startup, ampnode instances create a Redis sub-process for use as a shared memory store by various services. It also keeps in contact with a Keyspace cluster in order to partition and load balance the lexicographically-ordered “key space” of the Redis servers.
For example, imagine there are four ampnode instances with a Redis server each. After a while, the respective “key space” they'd be responsible for might be split up like:
- redis-1, for all keys starting with A-F
- redis-2, for all keys starting with G-K
- redis-3, for all keys starting with L-S
- redis-4, for all keys starting with T-Z
Keyspace provides a Paxos-based lease mechanism and a nice strongly consistent datastore without a single-point-of-failure. The ampnode instances use it as a co-ordination space to manage 10-second leases of the “key space”.
Note that, in contrast to the secure, asynchronous, Argonought encoded calls to other nodes in Ampify 0.x, Zero uses unencrypted TCP and process-specific protocols to talk to Redis/Keyspace.
It is up to Host admins to secure the network using OpenVPN or something similar if instances will be communicating over the public internet.
Some “state” is held/cached by Ampnodes, e.g. in Redis, as persistent connections, in the built-in filestorage service, etc. These are designed to be “revivable”. That is, should the Ampnode die permanently, it should not cause any problems.
This is achieved by careful design of the built-in services and the use of durable stores, e.g. App Engine datastores for Ampify Items and S3 for files.
However, due to latency issues and limitations of the various durable stores, Ampnodes intelligently store copies “locally” in a manner suitable for Ampify applications.
Two custom App Engine applications have also been developed to help on this front. One is called zerodata — it provides a minimal wrapper around the App Engine datastore for storing Ampify Items and allows for both transactional writes and parallel queries to be executed.
The other is called logstore. This is intended to be used by the various Ampnodes as the place to log access, usage and errors. The advantage to writing to logstore instead of to local log files is two-fold:
- It provides a centralised location to see all the logs
- It allows for rich reports to be generated using mapreduce
If, for any reason, App Engine should be down, AmpNodes will try and provide as much of a working instance as possible, e.g. various bits of data will still be available in read-only mode and the instances will log to local files temporarily, etc.
The use of proprietary platforms like App Engine and S3 is only a temporary measure for Ampify Zero. The 0.x line will see the development of a scalable, strongly consistent, richly queryable, suited for data warehousing, live datastore called ampstore.
Overview of the Architecture 2
The Ampify Zero application frontend is primarily written in Naaga — a language that compiles down to static JavaScript files for execution on a user's web browser. The application makes either traditional or Argonought-encoded requests to the backend “Nodule processes” over HTTP.
These Nodules provide a number of services — from interfacing with an App Engine powered datastore to event routing to trust metric calculations — and are automatically built and deployed across servers by a supervisor redpill daemon.
.----------------,
| |
| Naaga-based UI |
| |
.----------------,
+---------------+ (1) \ \
| Load Balancer | <-----> \ Client \ <---o
+---------------+ `----------------' |
| | | (9)
| | (2) |
| +--------------------+ |
| | Naaga Static Files | +--------------+
| +--------------------+ o----| Event Stream |
| | +--------------+
| +--------------+ |
| o--| Config Nodes | | +------------+
| (3) | +--------------+ o----| App Engine |
| | | | +------------+
+------------+ | (4) | |
| Dispatcher |---o | (8) | +-----------+
+------------+ | o----| Amazon S3 |
\ | | +-----------+
\ |-----------|
\ | | +-------------+
\ (5) | o----| Other Nodes |
\ | (7) +-------------+
\ |
+===========\=================|=================================+
| \ | |
| +----------------------+ |
| | Node: Redpill Daemon | |
| +----------------------+ |
| | |
| | (6) |
| +-------------------------+-----------------------+ |
| | | | |
| +----------------+ | +----------------+ |
| | Nodule Process | +----------------+ | Nodule Process | |
| +----------------+ | Nodule Process | +----------------+ |
| +----------------+ |
| |
+===============================================================+
Legend
(1) Client initiates a request.
.
(2) If the request was for a Naaga static file, it is served.
.
(3) All other requests (service requests) are handled by a special
Dispatcher.
.
(4) The Dispatcher queries Config Nodes to find out which Nodes
can handle the service request.
.
(5) The Dispatcher converts the request to an Argonought-encoded
service call and forwards it to the appropriate Node.
.
(6) A Nodule process within the Node handles the service call and
depending on the nature of the request, responds immediately
or once it has finished.
.
(7) The Nodule may also make further service calls. These would
be mediated by it's local Redpill daemon which may make
queries to Config Nodes to figure out where to send the call.
.
(8) Some calls will be traditional HTTP requests to external
services and as such will not be mediated.
.
(9) The Client will also on startup make a connection to an Event
Stream which some service calls may update in order to send
messages to Clients asynchronously.
Naaga
✗ Implement an Emacs major mode for Naaga.
Editing Naaga files should be a pleasure in Emacs and the mode should require nothing more than the following in the user's .emacs file to work:
(add-to-list 'load-path "path/to/naaga-mode")
(require 'naaga-mode)
(add-to-list 'auto-mode-alist '("\\.naaga$" . naaga-mode))The mode should support syntax highlighting, automatic indentation, electric backspace and at least the following commands which should be boundable to custom keys:
- naaga-compile-buffer
- naaga-execute-buffer
- naaga-shift-region-left
- naaga-shift-region-right
And, finally, the mode should integrate well with js2-mode and support basic hooks for extensibility:
(defun naaga-custom ()
"naaga-mode-hook"
(set (make-local-variable 'tab-width) 2))
(add-hook naaga-mode-hook
'(lambda() (naaga-custom)))For bonus points, the mode should integrate well with naaga.test.
Zerodata
✗ Define the Message type structure.
The Message type is the heart of Ampify. Here's a start at its definition:
type Message struct {
from string // who is the message from?
by string // who is the message authored by?
to string // where is the message sent to?
aspect string // what /aspect is being defined?
content string // the raw message body
value_number big.Number // the message value
value_list []string
version int
}It combines a number of different data models into a pretty flexible generic one.
Argonought
✗ Define the Argonought serialisation and exchange format.
There are literally hundreds of serialisation/exchange formats/protocols available to us: ASN.1, Avro, BERT, BSON, Etch, Gob, Hessian, JSON, MessagePack, Pickle, Protocol Buffers, S-Expressions, SOAP, Thrift, XDR, XML-RPC, YAML, &c.
They are all brilliant in their own way. But, unfortunately, none of them offer the full set of features that would be ideal and provided by Argonought:
- JSON-like simplicty.
- Efficient binary encoding.
- Rich set of builtin types — currencies, location, sets, &c.
- Optional streaming support.
- Support for synchronous/async calls.
- Ability to define arbitrary environment/headers.
- Architecture independent representation.
- No need to escape binary data.
- Lexicographically sortable encoding.
- Dynamic (No IDLs or code generation).
BERT and BERT-RPC from GitHub are the closest existing format to the above ideal. Unfortunately it doesn't support the much needed lexicographically sortable encoding or the full set of desired builtin types.
It may also be worth pointing out the downsides to the CORBA-esque formats like Protocol Buffers and Thrift. These depend on specifications being defined using an IDL and code being generated to support the specification in target languages.
This can definitely be attractive in certain contexts, but it can also prove to be a right pain as developers have to take on the burden of constantly keeping both the IDL generated code and the app code up-to-date at multiple locations.
For example, with Protocol Buffers, you'd have to first create a specification:
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}Then you'd have to generate the code for your target languages before you can start doing anything, e.g.
person = Person()
person.set_name("John Doe")
person.set_id(1234)
person.set_email("jdoe@example.com")In contrast, consider the simplicity of JSON:
{ "type": "Person",
"name": "John Doe",
"id": 1234,
"email": "jdoe@example.com" }It allows for dynamic development as there's no intermediate step. And should the developer wish to add a new field, there'd be no need to update the spec and the generated code in all target languages before making changes to the actual application code!
✓ Implement support for lexicographically sortable representation of numbers.
When dealing with numbers, most people don't care whether it's an unsigned 32-bit integer or a 128-bit decimal. Even programmers have been spoilt by automatic type coercion in many dynamic languages, e.g.
>> 1000000.class
=> Fixnum
>> 10000000000.class
=> BignumUnfortunately this support is not reflected in most datastores. App Engine, for example, indexes strings up to 500 bytes in length but truncates integer values greater than 64 bits.
If one were to try and encode the number as a string to take advantage of the extra space, the naive approach fails quite quickly:
>> '9' > '8'
=> true
>> '1000' > '9'
=> falseThe official documentation for SimpleDB even recommends the usual tricks of negative-number offsetting and zero padding as a solution to this problem:
>> '1000' > '0009'
=> falseBut this approach gets a massive FAIL since it requires not only having to predefine the largest possible value (the exact problem we're trying to avoid!), but also ends up wasting a lot of space due to the zero padding.
To solve these and other problems, Argonought implements a very efficient encoding of numbers so that they can be sorted in lexicographical order:
>>> argonought.pack_number(8234364) # takes just 3 bytes!
'\xfe\xa2\xa0'
>>> argonought.pack_number(8234364) > argonought.pack_number(-234364)
TrueIt can handle, subject to memory limits, numbers of whatever size — including arbitrary precision decimals like 1.618033988749894848204586834365638.
✗ Define a plexname normalisation algorithm.
Plexnames should be comparable.
@register_service()
def get_canonical_plexname(plexname):
"""Return a canonical form of a plexname."""
if u'\u0345' in plexname: # COMBINING GREEK YPOGEGRAMMENI
plexname = normalise_unicode('NFD', plexname)
canonised = []
space = False
for char in plexname:
if (category(char) in ['Cc', 'Zs']) or (ord(char) in SPECIALS):
space = True
continue
if space:
space = False
if canonised:
canonised.append(u'-')
if char in CASE_MAP:
canonised.append(CASE_MAP[char])
else:
canonised.append(char)
plexname = normalise_unicode('NFKD', u''.join(canonised))
canonised[:] = []
for char in plexname:
if char in CASE_MAP:
canonised.append(CASE_MAP[char])
else:
canonised.append(char)
return normalise_unicode('NFKC', u''.join(canonised))And, there's a lot of blah.
Development Process
✓ Investigate code review and continuous integration tools.
There are numerous continuous integration tools and services out there. Of the commercial ones, TeamCity is a nice offering — but is limited to just Java, .NET and Ruby. Hudson and Buildbot are very impressive open source offerings but only handle the build, test and notification phases.
The Go project has a nice custom build status dashboard which is worth a mention. On the browser-based Javascript front, TestSwarm offers a glimpse of the future — distributed testing by casual visitors of various bits of Javascript code!
From a code review perspective, nothing seems to beat rietveld which is also hosted at codereview.appspot.com. Unfortunately its user interface isn't exactly appealing. In contrast, GitHub has a really sexy interface for making commit comments that can be aggregated in compare views.
With regards development practises relating to these tools, the Chromium project is quite exemplary:
- Various presubmit scripts are run before code is submitted for review.
- A really simple watchlist mechanism allows for interested developers of specific code paths to be automatically be cc'ed into the review.
- Developers can run their uncommitted patches on try servers to check whether it works on different platforms.
- There is even a process for tree sheriffs to maintain a healthy tree status when committed builds fail for some reason.
✓ Implement a planfile generator and client-side app.
Planfiles are intended as a happy medium between a minimal TODO text file and a full blown web-based issue tracker. Two complementary reStructuredText directives have been created to support the creation of HTML planfiles from the plain text variant.
The plan directive inserts some Javascript into generated HTML files for a client-side web application which will allow viewers to filter plan items based on tags. The app has special support for bookmark-able filter URLs via the fragment identifier and will degrade gracefully on older browsers.
.. plan:: project-idIt does, however, expect certain CSS classes to be defined and for the plan.js file to be accessible at /js/plan.js.
Individual plan items are easy to define. Just nest them inside the tag directive which allows for plan items to be tagged with arbitrary tags., e.g.
.. tag:: #dev, #ui, @tav, WIP
✗ Implement a planfile generator and client-side app.
This is what a sample plan item would look like.There is special support for #hashtags, @names and even milestone: and dep: style tags. These will have different CSS classes and can therefore be visually disambiguated easily. Displaying of items based on dependency analysis isn't there yet, but could be added relatively easily.
✓ Implement a static site generator for our documentation/sites.
In our experimentation with using Markdown and Jekyll, Jekyll was great but Markdown didn't suit our particular use. So we updated yatiblog, our ancient reStructuredText based static site generator, with Jekyll's YAML front matter feature — which is working out quite well.
When yatiblog is run, it converts all the .txt files in the directory to .html files in a website subdirectory according to the layout variable specified in each txt file's YAML front matter, e.g.
---
layout: page
title: Zero Planfile
---All other specified variables are passed onto the layout file, which are Genshi templates within a special _layouts directory. So in the above example, page.genshi could make use of ${title}. A layout can also define a layout for itself, allowing ease of re-use of templates.
The variables defined on a page also shadow the variables defined in a layout so one can have default values for variables which can be over-ridden on a per page/layout basis.
Yatiblog requires that there be a special yatiblog.conf file in the source directory which defines default variables. This MUST include a list of mappings for the index_pages, e.g.
index_pages:
- archive.html: archive.genshi
- index.html: index.genshi
- feed.rss: feed.genshiThis will generate the specified “index” pages after all other pages are created. Yatiblog also compares the modified time of the various files and does a comprehensive dependency analysis so that only necessary files are regenerated whenever yatiblog is run.
✓ Implement a source code documentation generator.
The source documentation extractor and generator support in yatiblog has been updated with ideas from godoc and docco. It currently supports generating sexy documentation for code written in:
- CoffeeScript
- Emacs Lisp
- Go
- JavaScript
- Python
- Ruby
It extracts all comments which are immediately followed by code, i.e. with no empty lines in between, as well as any comments which look like reStructuredText headings and generates an rST document interwoven with the (syntax highlighted) source code.
// This comment will not be extracted. This is useful to
// ignore copyright headers, non-public comments, &c.
package yatiblog
// But this comment will be extracted, since there's no
// empty line after it.
const shebang = "#!"Running yatiblog will automatically generate documentation for all paths specified as part of the code_pages setting in yatiblog.conf — with the generated content being passed to the specified layout:
code_pages:
layout: code
paths:
amp.%(dir)s.%(filename)s: src/amp/*.go
zerodata: src/zerodata/zerodaa.pyThe paths setting should be a mapping of filename => source_files:
- filename specifies either a single filename or a filename pattern for the generated documentation, with %(dir)s and %(filename)s being substituted by the directory and filename of the source file.
- source_files are shell patterns for files in the Git index and need to be specified relative to the repository root.
Safe Execution
✓ Design a safe execution architecture.
Our safe execution architecture works by using multiple layers of isolation. On the outermost layer we have a process which has full access to the raw network, memory and processor. Only absolutely trusted code should execute in this environment.
Over time, even this should be minimised as much as possible. As Mark S. Miller would tell us, security against malicious threats and protection against unintended bugs are just two sides of the same coin.
The innermost layer is where the actual execution of arbitrary code provided by users takes place. The outermost layer communicates to this layer by a series of IPC calls using Unix domain sockets.
Intermediary layers provide various layers of isolation to ensure that resource usage does not exceed certain limits — especially network access to the outer world which will be absolutely denied except through RPC calls to the outermost layer.
✗ Investigate virtualised environment options.
There are a number of virtualisation and paravirtualisation projects that have gained mainstream adoption in recent times, e.g. Xen, KVM, &c. Unfortunately these are quite bloated for our needs.
In contrast there are a handful of Vritualised Environment offerings that are quite interesting. These allow one to manage chroot-like containers without duplicating resources unnecessarily. And thankfully, unlike chroots, they are quite hardened against privilege escalation.
On Linux there are 3 different toolsets to support this:
- OpenVZ — an impressive open source offering from the company behind the commercial Parallels virtualisation products.
- Linux-VServer — a similar offering to OpenVZ but one that has been around much longer. It offers decent security primitives like barriers and vfs namespaces.
- LXC — the only one of the three which is in the kernel mainline. It looks very promising but doesn't seem to be production ready just yet.
Unfortunately we don't have much experience in this field to properly choose between OpenVZ and VServer. Everyone seems to be either ideologically biased or have too much of a rose-tinted perspective to provide a proper technical evaluation of the two.
Another issue to bear in mind is the long term maintenance of whatever option we decide to go for — since maintaining kernel patches isn't much of a pleasant affair.
✓ Investigate seccomp sandboxing.
The Linux kernel has a really nice sandboxing mode called seccomp. Threads can switch it on by doing a prctl system call with the PR_SET_SECCOMP argument. After that, the thread can't make any further system calls besides using read() and write() on already opened file descriptors and sigreturn() and exit().
Any other system call will result in it being immediately terminated via a SIGKILL from the kernel. It's a rather elegant model with super minimal performance overhead. It can even be activated in languages like Ruby using syscall() and in Python using:
import ctypes
# use the constant defined in /usr/include/linux/prctl.h
PR_SET_SECCOMP = 22
libc = ctypes.CDLL('/lib/libc.so.6')
libc.prctl(PR_SET_SECCOMP, 1)As explained in Adam Langley's Chromium's Sandbox article, seccomp is very desirable as it makes it very easy to prevent attacks. Unfortunately, one very quickly runs into needing various system calls, e.g. mmap for allocating memory.
To solve this, Google are currently developing a wrapper seccompsandbox which allows a process to be split into trusted and untrusted threads. The trusted thread receives requests to make system calls from the untrusted thread over a socket pair. It then validates the system call against syscall_table.c before performing it.
Unfortunately the wrapper is currently not usable outside of Chromium and porting it is no parkwalkian affair as it requires disassembling memory and writing the entire trusted thread in assembler.
But at some point we should be able to leverage it to embed processes like Python, Ruby and Javascript interpreters. And, when we do invest time into that, we should review this thread on the kernel list.
✓ Investigate sandboxing using Native Client.
Native Client (NaCl) is perhaps the most exciting development in security since public key cryptography and object capability. It sets out to provide a secure sandbox for running native code in web applications.
Its design consist of two layers of defence to ensure that unsafe x86 instructions are not executed — providing protection from buffer overflow exploits, illegal filesystem access and even resource exhaustion attacks.
It's possible to port large applications to run on top of the Native Client Runtime without too much trouble. There have already been ports of both Ruby on NaCl and Python on NaCl with varying degrees of success.
Unfortunately NaCl still needs a little bit more development before it can be used in anger, e.g. support for 64-bit platforms, handling of JIT'd interpreters, dynamic linking and use outside of browser environments.
✓ Investigate access limiting in Linux.
The POSIX Capabilities in the Linux kernel are so useless that they defy belief. They provide a way for a root-privileged process to drop various privileges. But, in practise, they add an additional attack vector that we have to protect against.
In contrast, setrlimit is a useful system call that allows us to set hard limits on a process's resource usage, e.g.
>>> import os
>>> import resource
>>> # set (soft, hard) limit so new processes can't be created
>>> resource.setrlimit(resource.RLIMIT_NPROC, (0, 0))
>>> # set limits so new file descriptors can't be created
>>> resource.setrlimit(resource.RLIMIT_NOFILE, (0, 0))
>>> # creating new processes will now fail
>>> os.system('ls')
-1
>>> # and the process can't later increase the limits, woo!
>>> resource.setrlimit(resource.RLIMIT_NPROC, (1, 1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: not allowed to raise maximum limitThis isn't quite perfect, e.g. unlink() and rename() still work — but it's a start. Next up we can use chroot to drop access to the file system. This is quite tricky to get right and there a number of issues to take into account.
For starters, the chroot() call needs root privileges. So we need to make the call, drop capability privileges and then switch to an unused real/effective/saved UID and GID. This is a lot more complicated than it sounds. See the setuid sandbox below for more info.
The Writing Secure Privileged Programs, Setuid Demystified and Security In-Depth for Linux Software documents have a lot of good pointers worth following.
✗ Implement a setuid sandbox wrapper for Linux.
The Chromium guys have implemented a pretty decent Setuid Sandbox. It provides a wrapper for chrooting, root privileges dropping and uid/gid switching. A co-operative process which wants to be sandboxed, will have to communicate with it so that it can clone() with CLONE_FS and then chroot() to an empty directory.
The kernel needs to be compiled with PID namesapces support so that CLONE_NEWPID can be used to prevent the sandboxed process from ptrace()-ing other processes. The source for the sandbox can be found in the Chromium repository: here.
The code is currently geared for use by just Chromium. We need to extract out the useful bits and make it into a sandbox implementation suited to our specific use for supporting safe execution.
✗ Investigate mandatory access control in Linux.
There a number of mandatory access control mechanisms on Linux. Most of these make use of the LSM (Linux Security Module) in the kernel:
- SELinux is the big boy. Unfortunately it's so complicated that it's pretty hard to figure out if a policy is actually secure.
- Tomoyo is relatively lightweight and uses pathname-based controls to separate security domains.
- AppArmor is a similar alternative but its future is seemingly uncertain since Novell laid off the dev team.
- SMACK offers an even simpler labels-based approach.
- PaX/grsecurity are a series of kernel patches for general all-round “hardened” protection.
Most of these are so complex that one wonders whether they offer any real security. This article offers a brief comparison of SELinux and SMACK setups and is a good companion read to the Linux Kernel Security Overview presentation.
Since there is a combined VServer/grsecurity patch available, that seemed a good point to begin investigating. Using VirtualBox, created a 64-bit virtual machine with a minimal Ubuntu Karmic Koala install.
Following the Kernel Mainline Builds instructions, have obtained Karmic packages for vanilla 2.6.31.12 kernel and its sources. Found the Ubuntu to Mainline kernel mapping to be a useful resource in doing so.
After installing the packages and rebooting the virtual machine, following the VServer Installation on Linux 2.6 instructions, applied the combined VServer 2.3/grsecurity patch to the sources. Reused .config from Christoph Lukas' 2.6.31-19-vserver kernel obtained from his Karmic package repo. He also has a ubuntu-karmic-vserver git repo.
Ran make oldconfig and make menuconfig to configure grsecurity/PaX options. Used this VServer Administration wiki page and this config file to inform the Grsecurity and PaX Configuration Options that were chosen. Used the command fakeroot make-kpkg --initrd --revision=20100306 kernel_image kernel_headers to kick off the custom kernel build.
✓ Investigate the PyPy sandbox.
PyPy-based interpreters can be translated with a special --sandbox option so that they run under a special sandboxed mode. The translation ensures that system calls are marshalled to stdout and waits for them to be executed and returned from a controller process via stdin.
The controller process can validate the calls and present whatever view it wants, e.g. a virtual filesystem, to the sandboxed process. Since the sandbox needs to make certain system calls itself, e.g. to allocate memory, these are enabled with a sandboxsafe=True on a case-by-case basis, e.g.
malloc_fn_ptr = rffi.llexternal(
GC_MALLOC,
[lltype.Signed],
llmemory.GCREF,
compilation_info=compilation_info,
sandboxsafe=True,
_nowrapper=True
)This is done in such a way that although the garbage collector can allocate memory, other functions trying to call malloc() will have to be validated and run by the controller process. Of course the controller process only has to implement calls that make sense for it to call.
It would be really cool to add seccomp support to the sandboxed process as an additional layer of defence — any such implementation would have to take into consideration issues like fishing GC roots from the stack of the various threads.
✓ Implement PyPy sandbox support for the JIT.
The JIT support in PyPy is becoming quite impressive and will be extremely well suited for “number crunching” in the cloud. Unfortunately it complicates matters with regards the PyPy sandbox.
But thankfully fijal has now made the sandbox work with JIT on the jit-sandbox branch on the PyPy subversion repository.
Request Handling
✓ Investigate managed DNS providers.
Any effort to provide high availability is buggered if a user's first action — doing a DNS lookup for ampify.it — is unresponsive/faulty. We're currently using Linode's super nice DNS service that comes with our VPS, but should find another provider as we outgrow it.
UltraDNS is the big boy in the managed DNS world and are priced so high that only other big boys could afford it. DNS Made Easy, in comparison, costs over 100x less! They have an impressive offering at very affordable prices:
- DNS servers running on a globally distributed anycast network — so users should get their queries answered faster.
- Automated updating of A records to failover IPs if their monitoring system (which runs every 2-4 minutes) detects our servers to be unresponsive.
One major feature they lack is geo-location based serving of records. For example sending requests from Italy to our European data center and requests from Brazil to our North American data center. They do suggest on their Facebook page that support for this might get added later this year.
The more expensive Dynect service — who provide everything offered by DNS Made Easy and even seem to be marginally faster — has geo-location based serving as part of their Global Server Load Balancer solution.
Their combined monitoring, load balancing and geo-routing sounds so good that it's tempting to forego any kind of LVS based load balancing at our data centers. However this has a few drawbacks:
- It will not be possible to enable people to point a root/apex domain at our servers as it will not have a relatively stable IP address.
- The monitoring by Dynect is not going to be as responsive as an LVS director running on the local network.
✗ Investigate load balancing solutions.
In public deployments, requests should be load balanced across Ampify frontend servers. Hardware load balancers simply don't offer enough value to justify their cost, so we need to find a decent software-based solution.
There are a number of LVS (Linux Virtual Server) components that are worth exploring: ldirectord, heartbeat, keepalived, &c. Hosting companies like Hostway also offer managed LVS which might be worthwhile.
Whatever option is decided upon, it should be based on open source software, have redundancy for the load balancer itself and deal with health checks for HTTP, HTTPS and SMTP backend servers.
✗ Setup network monitoring infrastructure.
Monitoring of resource usage is essential for detecting attacks and identifying potential bottlenecks to optimise — not to mention capacity planning. Thankfully there's increased support for protocols like SNMP nowadays.
There a number of tools that can help with the monitoring, e.g. rrdtool, munin, shinken, cacti, mrtg, &c. The appropriate tools should be setup on our frontend servers and an overview should be accessible at centralised location(s).
✗ Define the server firewall infrastructure.
There should be a set of iptables configurations:
- Frontend servers should only allow for external TCP traffic to ports 8025, 8080 and 8443.
- Internal traffic should be able to connect on all non-privileged ports.
We should have a decent default configuration which protects against common denial of service attack patterns. It should also be possible to do live updates of denied hosts on our servers in addition to working with our hosting companies to block hosts at the network switches.
✗ Buy an SSL certificate for ampify.it.
We should only serve Ampify requests over SSL. While this means a lot of additional cost to us due to the extra resources used by secure connections, the security benefit to end users will make this worthwhile, e.g. protection against man in the middle attacks.
There are only a few root Certificate Authorities recognised by most browsers. Most of the companies like GeoTrust (Equifax) and Thawte are owned by Verisign. They take advantage of their monopoly by charging through the roof for something that costs very little.
There are a number of cheaper SSL certificate providers like Comodo and GoDaddy. RapidSSL, a GeoTrust subsidiary, also offer fairly cheap certs. As of this writing they are offering a wildcard cert for $199/year.
As for EV certificates, there's no conclusive evidence that the green bar provides any real additional value, so we'll skip that for now.
✓ Investigate frontend web servers.
In an ideal world the Ampify proxy node would be a production level HTTP/1.1 compliant web server. But unfortunately that's a while off and until then we need to have a decent web server in front of it.
There are a number of decent options available nowadays. Of these, nginx offers good performance, stability and even support for upgrading the nginx binary without downtime. Cherokee and lighttpd are decent alternatives but do not inspire as much confidence as nginx.
✗ Put together an nginx package for redpill.
Nginx has two different branches: 0.7.x (stable) and 0.8.x (development). The development branch should be tested for stability, otherwise the package should be based on the stable branch.
The package build file should ensure that the http_ssl, http_gzip, http_proxy and any other relevant modules are built — minimising needless bloat and dependencies (i.e. security holes) as much as possible.
✗ Define a configuration file for the nginx servers.
There needs to be a default nginx config file we can use across all of our frontend servers. This should cover everything from ensuring that requests are served over SSL (redirecting non-secured connections) to passing on the request to the Ampify proxy nodes.
We'll be using proxy_pass since FastCGI doesn't seem to offer enough of a performance benefit. Special care should be taken to handle the request IP address and gzip encoding — the tornado config is a good starting point.
The configuration should be optimised based on benchmarking using ab or httperf. It's unfortunate that nginx doesn't have HAProxy's really useful stats page, but perhaps something could be approximated using nginx-rrd to help with identifying bottlenecks.
✗ Implement ping responses on the frontend.
A number of different services from the monitoring systems of our DNS providers to the LVS director will be doing health checks on our frontend web servers. These should support the following responses:
/a/ping.frontend — a response page on the frontend nginx servers which should simply return the plain text pong.
/a/ping.proxy — a response service on the proxy node behind the frontend servers. This should respond with appropriate values for the following JSON structure:
{"zone": "eu-1", "timestamp": 1266978987.705}
✗ Conduct pentesting of the frontend infrastructure.
Unfortunately the Internet isn't as friendly a place as we might like it to be. There are folks who would like to break into our servers just for fun. As the saying goes, “If you can't beat them, join them”. Pre-launch, we need to conduct thorough penetration testing of our servers to ensure that they can keep out the script kiddies and more capable adversaries.
Status Monitoring
✓ Investigate off-site uptime and latency monitoring services.
It's important to have off-site monitoring so that we can quickly know about issues that end users might be experiencing. We could run something like Nagios on a bunch of off-site VPS nodes, but it's bloated and ugly.
The same goes for monitis which is a pretty decent third-party monitoring and notification service. Its competitor Pingdom looks a lot prettier but charges too much without offering enough flexibility. Conclusion: code our own monitoring service.
✗ Implement data gatherers for remonit — a remote monitoring tool.
The remonit tool should support configurable monitoring of a number of different factors by wrapping around mature Unix tools:
- Network latency using ping -c 1.
- Network routing using traceroute.
- DNS lookup speed and results using host -v.
- HTTP/HTTPS latency and responses — with string, headers and status code check.
- SSL certificate verification.
✗ Implement the configuration system for remonit.
The config system should be painless to manage and support monitoring of various worst-case scenarios for the local remonit id. The remonit app should also support live reloading of the config when sent a SIGHUP.
✗ Implement a data persistence layer for remonit.
The data gathered by remonit should be persisted to a local Redis instance according to the configuration. The redis data should be pruned every so often and saved to S3 so that we have historic data for the future.
✗ Implement a notification system for remonit.
Certain events from remonit's data gatherers should trigger configurable notifications. Notifications should support “dampeners” so as to avoid notification floods.
There should be notifiers for email (via SendGrid), SMS (via BulkSMS), Growl (OS X Desktop) and Prowl (iPhone). Perhaps take a look at aeservmon for any relevant code.
✗ Export the remonit data over a web interface.
A basic web application should be implemented which exports the local remonit data as JSONP over HTTP with the following API:
- /get/<id> — return the full data associated with the given event ID.
- /info/<profile>/<year>/<month>/<day> — return a mapping of configuration IDs to a list of [timestamp, event ID, latency value, error state, optional throughput value] for the given date and profile.
- /list/<profile> — return a list of configurations (name, id and description) associated with the given profile.
- /status/<year>/<month>/<day> — return a mapping of profiles to a list of summary state (okay/disrupted) for the 8 days leading up to the specified date (including the current state).
✗ Setup off-site servers for running remonit.
Linode provide a really nice VPS hosting service. We can setup one linode in the U.S. and another in the U.K. — enabling us to have off-site monitoring of the North American and European Ampify data centers:
- eu.status.ampify.it
- us.status.ampify.it
We'd need to find a decent Asian VPS host for monitoring our Asian data center when it goes live.
✗ Implement the status.ampify.it app.
The app should should be implemented as a Sinatra app on Heroku as this would mean that the site would be on EC2 — away from the rest of our deployment. The site should be pretty to look at as well as useful.
It should take the JSON data from the various remonit web instances and display it as sexy Google Charts like on the App Engine Status page. It should also support broadcast messages like on the GitHub Status site — as well as scheduled maintenance announcements and calendar.
Hosting
✓ Investigate cloud service providers.
Choosing good hosting provider(s) for the Commons deployment is going to be important for the long-term success of Ampify. Unfortunately the current set of cloud services do not seem to be spectacularly attractive choices.
Services like Amazon EC2 are impressive from a technical perspective. For example, Amazon's Elastic Load Balancer makes what would otherwise be a pretty painful experience quite effortless.
However, on the flip side, the abstractions also introduce a lot of additional complexities to manage. This is most notable with the case of Amazon's Elastic Block Store. Writing a file to disk involves half a dozen support structures!
Cloud offerings also make very little sense from a financial perspective. Not only do comparable “raw metal” servers cost substantially less, but you also get a noticeable increase in raw performance.
And, finally, multi-tenancy introduces a host of novel security considerations. Server management is complicated enough without having to worry about these and other gaming issues.
✓ Investigate dedicated server providers.
Hostway seem to have quite a competitive offering with offers of good support and “100% uptime”. The global distribution (North America, Europe, East Asia) of their data centers is also quite attractive with regards future expansion.
Of the other providers, Terremark looks quite impressive, but do not seem to be affordable at a small scale. Codero (USA) and Hetzner (Germany) are very attractive price-wise, but may not have the most reliable network connectivity.
✗ Investigate/negotiate deal with Hostway.
There are a number of issues to verify:
- What's their process for scheduled maintenance? Do they have an IRC channel for live support?
- They don't have an IRC channel, but do offer 24x7 support and ticketing system.
- What are their limitations with regards to provisioning new servers? At what rate can they handle orders for new servers?
- They can deploy new servers (from any of their default configs) within 24 hours, which could potentially be rushed. Custom configurations would take longer though.
- What are our options about deploying across their various data centers? What is the latency between these locations?
- We can choose which data centers to deploy to.
- Do they offer ECC RAM on their servers? What kind specifically?
- Yes, the RAM on the servers we've been looking at is DDR3 ECC.
- Do they offer servers with Gigabit NIC? This will be quite important for the frontend servers.
- All servers are fitted with Gigabit NICs and it's quite easy to have the connection switched from 100Mb/s to 1Gb/s.
- What happens when servers exceed their allocated 2TB bandwidth? Is “internal” bandwidth charged for? What about across their data centers?
- Excess bandwidth is charged for at $0.35/GB. We can prebuy bandwidth (as long as it's over 2TB) at a rate of $0.09/GB.
- We can arrange for our servers to be setup on a private switch in which case we wouldn't be charged for internal traffic.
- Do they offer static IP addresses? What's their process for allocation of new address blocks?
Tav and Mamading are currently engaged in a dialogue with Jamie Fryer from Hostway.
Launch
✓ Design an Ampify logo.
There is now an Ampify logo:
It is derived from the Tamil Aayutha Ezhuthu and Roerich's Banner of Peace inspired Espian logo by Nadine Gahr with touches from Matt Morse. The typeface used is Cholla Unicase from Emigre.
✗ Figure out the launch strategy.
The whole Espian vision is pretty broad. We need to present Ampify in a way that is both accessible and attractive to the initial target audience. Current thoughts are to make an invitation to people to become one of 6,000 founding members to change the course of history:
- Create Weapons of Mass Construction!
✗ Create the storyboard for the launch video.
We need to convey the entire vision in an easily understandable, fast paced narrative within 10 minutes.
✗ Create the content for the launch video.
A combination of Apple's Keynote and Motion should be used to create evocative content for the video. It is justifiable to pack in a lot of information, e.g.
However the layout of the content should both be pleasant and allow for readers/viewers to easily ignore detail that they're not interested in.
✗ Compose the sound track for the launch video.
The effectiveness of the launch video can be dramatically increased by complementing it with a good sound track. The emotional palette should correspond to the message we're trying to convey.
✗ Record the voice-over for the launch video.
The voice-over should be in plain English so as to be accessible to as wide an audience as possible. Detail should be left to the visual content.
✗ Encode the launch video for publishing.
There a number of different encodings we could use. H.264 will probably yield the most effective distribution.
✗ Upload the launch video to YouTube/Vimeo.
The video needs to be uploaded to the YouTube account. We also have the 1,000,000th video spot on Vimeo which might be worth using.
✗ Update the articles on espians.com.
Some people will inevitably check out the espians.com site, so updating some of the articles on it could avoid possible confusions that may otherwise arise.
✗ Implement an Ampify app to help DM followers on Twitter.
Whilst not a great collaboration platform, Twitter is a very effective broadcast mechanism. We should leverage this to help spread the word. It is quite likely that those who're following us will be interested in Ampify, so on launch day we should DM them all asking for their help in retweeting.
By doing so, we can also personally inform them about Ampify in a way that's relevant to them. However, since many followers would be common to various Espians, we need to have a small utility app on Ampify which would:
- Figure out who our followers are and show which of us they are following.
- Allow us to mark them as having been contacted so that they don't get spammed by many of us contacting them at once.
- Open a new window for each follower which shows their current timeline in one frame (in case they've already found out about Ampify from elsewhere) and the form for sending them a DM in another.
✗ Launch!
We need to get everyone together for an organised launch over a 48 hour period:
- Aggregator sites (Hacker News, Reddit, Digg, Slashdot, Techmeme, &c.)
- Major blogs (ReadWriteWeb, TechCrunch, Boing Boing, &c.)
- Facebook (including various Groups/Pages)
- Selected mailing lists
- Selected individuals
- Selected communities (TED, GitHub, EDGE, nettime, &c.)
Care should be taken so as to spread the initial outreach to as diverse a community as possible.
Admin
✓ Register Metanational Commons Limited.
The entity “Metanational Commons Limited” has been officially registered with Companies House as Company No. 06834341 in England & Wales . The following individuals are the current officials for the company:
- Tav (Vageesan Sivapathasundaram)
- Sofia Bustamante
The official address is at our accountant's place to make handling of admin forms easier.
✗ Have a pre-launch meeting with the accountant.
Need to have a meeting with the accountant to go over a number of details, including:
- VAT registration
- Annual returns (due 31st March)
It might also be worth spending some time looking into incorporating a US subsidiary — as an LLC or a C-Corp.
✓ Setup a bank account for Metanational Commons.
A bank account has been setup with HSBC for Metanational Commons with Sofia having primary access to the account.
✓ Setup a Paypal account for Metanational Commons.
A Paypal account has been setup, verified and linked to the HSBC account.
✓ Investigate payment processors for handling membership subscriptions.
Braintree sound like a pretty awesome payment processor. They offer a combined merchant account and payment gateway. And their API is such that our servers never have to even touch user's credit card details!
✓ Contact Braintree.
Tav emailed Braintree asking about their pricing, sign up process and whether they already have a Python library for their API. Unfortunately they only provide merchant accounts for US companies — we'd need a Federal Tax ID (EIN) and a US bank account.
✗ Define the Terms of Service and Privacy Policy.
We need to have a decent terms of service and privacy policy for the Ampify service. They should be easily readable and focus on giving users as much control as possible while minimising our exposure to litigation.
✓ Setup a source repository for Ampify.
A repository has been setup on GitHub for the reasons outlined in our "getting started with git" article. It can be found at github.com/tav/ampify.
To clone the repo:
$ git clone https://github.com/tav/ampify.gitThose who've got write access should instead:
$ git clone git@github.com:tav/ampify.gitAsk Tav if you'd like write access.
✓ Define a Public Domain License.
It is insane that we cannot simply declare our work to be in the Public Domain. The folk at CERN and Tim Berners-Lee were able to place the first implementation of the Web into the Public Domain with this simple document:
Unfortunately we live in more complex times, where we have to take into account issues like moral rights and their implications. Therefore we need a Public Domain declaration that also doubles up as a license approximating it for the jurisdictions where one can no longer place works into the Public Domain.
A document based heavily on Creative Commons Zero with additional clauses for grant of patent rights has now been put together as our Public Domain License. The license is seen as a mere transitional requirement until international law adapts to the post intellectual property reality.
Contextualisation
✗ Implement basic language guessing.
Being able to guess languages will mean that we won't have to waste resources trying to have it translated into itself needlessly. Google provide a Language Detection API but this requires additional requests to be made.
Thankfully, it isn't too complicated to do it ourselves. As a first pass, we can leverage the Unicode Character Database to detect the use of foreign language scripts, e.g. Japanese.
>>> import unicodedata
>>> unicodedata.name('ホ')
'KATAKANA LETTER HO'Secondly, we can do an n-grams based statistical analysis of the content against pre-generated models for various languages. As a starting point we can use existing models from various open source initiatives.
The language guessing should be exposed as a function of the language service:
def guess(text:string) -> list:
return list_of_guessesFor example, calling it with the following snippet of Madras Tamil written in English, it should return something like:
>>> guess("Enna da oru leave letter kooda kudukama poitta?")
[(</lang/madras-tamil/latin>, 0.9714), (</lang/en>, 0.0672)]That is, it should return a list of (language-identifier, likelihood) tuples — in descending order of likelihood.
Misc Plan Items
✗ Implement Ampify Server.
The first deployment scenario for the Ampify Platform will be running the Ampify Engine (backend) natively on high performance dedicated servers in a data centre environment.
Ideally we'd use a minimal Ubuntu LTS base (Lucid Lynx?) as a foundation for this, something that's more stripped down than the default Ubuntu Server Edition base install.
In the RHEL/CentOS world, KickStart would be used to construct the desired minimal install. Preseed is the Ubuntu/Debian equivalent.
✗ Implement Ampify Desktop.
The next scenario has the Ampify Engine running in a virtual environment hosted on the user's regular desktop OS (Windows, Mac OS X, Linux) alongside the Ampify Browser which itself will be a native application.
An open question is which desktop virtualization environment to use. It needs to be one that is freely redistributable which probably disqualifies VMware and possibly the non-OSE version of VirtualBox. Another option is QEMU.
✗ Implement Ampify Mobile.
Another scenario has the Ampify Browser running as a native application on mobile devices such as smartphones running iPhone OS, Android, MeeGo and possibly Blackberry.
It should be possible to port the Ampify Browser to all mobile platforms that have a WebKit implementation.
✗ Implement Ampify OS.
The final scenario has the Ampify Engine and Ampify Browser running natively on mobile devices such as netbooks, tablets (such as the Notion Ink Adam), smartphones and smartbooks, eventually moving up to running on laptops and desktops.
The open question is whether to build this up from the same base used for the Ampify Server or possibly use Chromium OS or MeeGo for a base.
✗ Create a Makefile for the Go packages.
There's an experimental goinstall which allows for one to install packages with a command like goinstall github.com/hoisie/web.go.git. The package can then be imported in code using:
import "github.com/hoisie/web.go.git"The command will also recursively download all remote packages defined in the import statements of the packages it's installing — making it super nice to use.
✗ Implement location to timezone mapping.
Knowing the local time in different parts of the world is very useful — whether one is travelling or simply wanting to know if it's a reasonable time to call a fellow collaborator on the other side of the planet.
To help facilitate this, there will be a single utility function as part of the timezone service:
def get_timezone_at(latitude:number, longitude:number) -> string:
return timezoneThe /tzlocations namespace will be reserved for the data needed by the service.
✗ Generate language models from Wikipedia.
Wikipedia acts as a reasonably sized corpus for various languages. A mix of various techniques could be used to extract markup-free plain text from it to derive better language models.
The same opportunity should also be used to generate word-based n-grams as well as character-based ones — giving us a chance to develop other interesting applications besides just plain old language guessing.
Conventions
Whilst this document is not meant to be normative, it hopefully provides a basis for Ampify specifications to eventually emerge. These will be reviewed and accepted as part of the Open Culture Standards Index (OCSI) in a process similar to the Python Enhancement Proposals [PEP-1], i.e. an open process with the BDFL (Tav) giving final approval.
In the meantime, the key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” should be interpreted as described in RFC 2199 [RFC-2119].
The use of Go, Javascript, Python and Ruby syntax in code snippets is only for the purposes of clarity in expression. It seemed to make better sense than inventing arbitrary syntax for pseudo-code [Norvig-2000].
References
| [Ardia-2008] | Court Orders Wikileaks.org Shutdown David Ardia, Citizen Media Law Project, February 2008. |
| [Beck-2002] | Test Driven Development: By Example Kent Beck, Addison-Wesley Professional, 2002. |
| [Berners-Lee-1989] | Information Management: A Proposal Tim Berners-Lee, CERN, 1989. |
| [Berners-Lee-1998] |
Tim Berners-Lee, W3C, 1998. |
| [Bush-2007] | IPv6 Transition & Operational Reality Randy Bush, NANOG, 2007. |
| [Candea-2003] |
George Candea and Armando Fox, Proceedings of the 9th Workshop on Hot Topics in Operating Systems (HotOS-IX), Lihue, Hawaii, May 2003. |
| [Cavnar-1994] | N-Gram-Based Text Categorization William B. Cavnar and John M. Trenkle, Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR-94), Las Vegas, April 1994. |
| [Chatham-2010] | Beyond the Dollar: Rethinking the International Monetary System Chatham House Report, Edited by Paola Subacchi and John Driffill, March 2010. |
| [Cawley-2008] | Code is Data, and It Always Has Been Piers Cawley, April 2008. |
| [Cerf-2006] | Prepared Statement of Vinton G. Cerf Vinton G. Cerf, U.S. Senate Committee on Commerce, Science, and Transportation Hearing on “Network Neutrality”, February 2006. |
| [Cheng-2008] | Keeping Network Solutions from Cashing in on Your Subdomains Jacqui Cheng, Ars Technica, April 2008. |
| [Edwards-2004] | Manifesto of the Programmer Liberation Front Jonathan Edwards, June 2004. |
| [Edwards-2006] |
Jonathan Edwards, MIT CSAIL, May 2006. |
| [Edwards-2007] |
Jonathan Edwards, OOPSLA, October 2007. |
| [Eich-2007] |
Brendan Eich, Mozilla Corporation, 2007. |
| [Egnor-2001] | Airhook — Reliable, efficient transmission control for networks that suck Dan Egnor, 2001. |
| [FIPS-180-1] |
National Institute of Standards and Technology, U.S. Department Of Commerce, April 1995. |
| [Fitzpatrick-2007] |
Brad Fitzpatrick, August 2007. |
| [Frankston-2001] | DNS: A Safe Haven for URLs and Internet Identifiers Bob Frankston, August 2001. |
| [Gabriel-2007] |
Richard P. Gabriel, November 2007. |
| [Iskold-2008] | Top 10 Traits of a Rockstar Software Engineer Alex Iskold, ReadWriteWeb, 2008. |
| [Kegel-2001] |
Dan Kegel, 2001. |
| [Leggett-2008] |
Russell Leggett, April 2008. |
| [Levien-2000] |
Raph Levien, February 2000. |
| [Levien-2004] | Attack Resistant Trust Metrics Raph Levien, Draft Thesis, 2004. |
| [Miller-1988] | Markets and Computation: Agoric Open Systems Mark S. Miller and K. Eric Drexler, The Ecology of Computation, Bernardo Huberman (ed.) Elsevier Science Publishers/North-Holland, 1988. |
| [Neuberg-2003] | Introduction to the Peer-to-Peer Sockets Project Brad Neuberg, March 2003. |
| [Norvig-2000] |
Peter Norvig, 2000. |
| [PEP-1] |
Barry Warsaw, Jeremy Hylton, David Goodger, June 2000. |
| [RFC-768] |
Jon Postel, August 1980. |
| [RFC-793] |
IST, DARPA, September 1981. |
| [RFC-1034] | Domain names - Concepts and Facilities Paul Mockapetris, November 1987. |
| [RFC-2119] | Key words for use in RFCs to Indicate Requirement Levels Scott Bradner, March 1997. |
| [RFC-2460] | Internet Protocol, Version 6 (IPv6) Specification Stephen E. Deering and Robert M. Hinden, December 1998. |
| [RFC-2616] | Hypertext Transfer Protocol — HTTP/1.1 Roy Fielding, et al., June 1999. |
| [RFC-3056] | Connection of IPv6 Domains via IPv4 Clouds Brian E. Carpenter and Keith Moore, February 2001. |
| [RFC-3280] | Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile Russell Housley, et al., April 2002. |
| [RFC-3330] |
IANA, September 2002. |
| [RFC-3489] | STUN - Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs) Jonathan Rosenberg, et al., March 2003. |
| [RFC-3513] | Internet Protocol Version 6 (IPv6) Addressing Architecture Robert M. Hinden and Stephen E. Deering, April 2003. |
| [RFC-4122] | A Universally Unique IDentifier (UUID) URN Namespace Paul J. Leach, et al., July 2005. |
| [RFC-4346] | The Transport Layer Security (TLS) Protocol Version 1.1 Tim Dierks and Eric Rescorla, April 2006. |
| [RFC-4380] | Teredo: Tunneling IPv6 over UDP through Network Address Translations (NATs) Christian Huitema, February 2006. |
| [Ristenpart-2009] | Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds Thomas Ristenpart, Eran Tromer, Hovav Shacham and Stefan Savage, CCS’09, Chicago, November 2009. |
| [Saffer-2007] | A Call to Arms for Interaction Designers Dan Saffer, Adaptive Path, 2007. |
| [Slee-2007] | Thrift: Scalable Cross-Language Services Implementation Mark Slee, Aditya Agarwal and Marc Kwiatkowski, Facebook, 2007. |
Ignore This
“We need to learn new designs, design frameworks, and design approaches from naturally occurring, ultra large scale systems (e.g. biology, ecology)” [Gabriel-2007]
Redis 32-bit
For example, the following message written by @tav and sent to #espians:
/balance ~/account
Would like the following in Ruby after being parsed:
{
:from => "tav",
:aspect => "/balance",
:to => "#espians",
:value_number => 9203180132
}
>>> nodule.init()
[0, 'START']
>>> status == 12
True
>>> def foo():
... if N_set: return True
[0, 'START']
invisible mode
calendar/locatrion merge
✗ Trust metrics calculation
✗ Weather Theme as ambient system status indicator
✗ Nodules
✓ Weblite base framework ✓ Weblite service loader ✓ Tamper-proof cookies ✓ HTTP only cookies
✓ Appengine manager script ✓ Remote API handler script ✓ Python third party dependencies
✓ Minification script ✓ Lint checker script ✗ Javascript third party dependencies
✗ Build script for jsutil
✗ nodebrowser.js builder ✗ nodebrowser.js port of env.js ✗ nodebrowser.js runner ✗ nodebrowser.js testing
http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html taskset (part of schedutils) for cpu affinity on linux
✗ Go build script
✗ Python client setup.py/build script ✗ Python client testing framework ✗ Python client testing runner
✓ Appengine base yaml files ✓ Site configuration file ✓ Site secret file
✗ Main site template ✗ Main site css ✗ Border-based controls
✗ User identity framework ✗ User identity core model ✗ Invitation ✗ Membership ✗ Membership payment (via PayPal) ✗ Terms of service acceptance with updates
http://orestis.gr/blog/2008/02/09/moving-to-numpy/ 6.8 billion people according to the UN http://en.wikipedia.org/wiki/World_population
http://orestis.gr/blog/2008/03/25/django-localdates-ping/
✗ Login ✗ Email ✗ OpenID ✗ Google ✗ Facebook ✗ Twitter ✗ Flickr ✗ Domains
✗ Profile (incl. Skype) ✗ Profile picture ✗ User settings/configuration
✗ App/service keys ✗ App revocation ✗ OAuth ✗ CSRF ✗ User quotas ✗ Scheduled/unscheduled maintenance support
✗ Item framework ✗ Item core model ✗ Item log model (trends, etc.)
✗ Item creation ✗ Sensor notification on create ✗ Aspect/value parsing ✗ Builtin types ✗ Item indexing ✗ Item attachment ✗ File uploading
✗ Private spaces (capabilities) ✗ Private space creation ✗ Private space invitation
✗ Reaction definition ✗ Builtin incr/decr reactions ✗ Calling of before/after reactions ✗ Private messaging ✗ Espra credits allocation ✗ Espra credits usage
✗ Item editing ✗ Item versioning ✗ Item positioning ✗ Item deleting
✗ Amp-ing ✗ Re-spacing ✗ Pecu-allocating ✗ Commenting ✗ Symlinking ✗ Original chain traversal
✗ Trustmap definition ✗ Trustmap query ✗ Trustmap editor ✗ Recursive trustmap
✗ Item search ✗ Search syntax ✗ Client-side search results merge ✗ Parallel query ✗ Merged aspect/value query
✗ Real-time update ✗ Live node ✗ Sensor network ✗ Sensor node ✗ Sensor pattern match finder
✗ Redis server ✗ Redis client
✗ Worker node ✗ Resync checker ✗ Web hooks POSTer
✗ HTTP server ✗ Server connection stats ✗ Web Sockets support ✗ Web Sockets client-side support ✗ Iframe client-side support
✗ HTTP cache: etags, last-modified, if-modified-since ✗ Client-side caches ✗ Item caches ✗ Dolumn-cached loading
✗ Session management ✗ Dolumns ✗ Scrolling
✗ Item views ✗ Listing view ✗ Calendar view ✗ Streak calendar view ✗ Map view ✗ Arbitrary view
✗ Autocompletion ✗ Autogrowing editor ✗ Inline editor ✗ Auto-preview ✗ Position editor ✗ Uploader ✗ Map location selector ✗ Date/time selector
✗ Widgets ✗ Error handling ✗ Error reporting ✗ Resync notification ✗ General notifications
✗ Userscripts definition ✗ Userstyles definition ✗ Background uploading ✗ Userscript hooks/ordering ✗ Safe login
✗ OEmbed ✗ OEmbed (Skitch) ✗ Longurls
✗ Links ✗ Includes ✗ XSS sanitation
✗ Lambdascript ✗ Lambdascript parser ✗ Lambdascript exec ✗ Builtins ✗ Unicode support ✗ Formatters ✗ I18n support ✗ I18n data ✗ Timezone support ✗ Timezone data ✗ Decimal support ✗ Queries ✗ Function definition ✗ Views ✗ Text type (rst, markdown, etc.) ✗ Paste type ✗ Syntax highlighting ✗ Pattern definition ✗ Pattern extraction
✗ Media player ✗ YouTube ✗ MP3s ✗ FlowPlayer
✗ Pecu app ✗ Pecu allocation ✗ Auto-allocation ✗ Top rated ✗ Top rated in period ✗ Pecu payouts ✗ Pecu integration with PayPal
✗ Shaila app ✗ Shaila creation mode ✗ Shaila grab: Flickr, Twitter, etc. ✗ Shaila browsing
✗ Packaged views ✗ Packaged views creator ✗ Packaged views verifier/installer
✗ GitHub integration ✗ Google calendar integration: in ✗ Google calendar integration: out
✗ RSS integration: in ✗ RSS integration: out ✗ Twitter integration: out
✗ Domain validation ✗ Domain view definition ✗ Domain view rendering
✗ Node logging support ✗ Node logging redis server ✗ Node log monitor app
✗ Visualisation framework ✗ Graph (trustmap-like) viewer ✗ Temporal viewer
✗ Email integration ✗ SMS integration ✗ XMPP integration ✗ Google Wave bot
✗ DNS setup ✗ AMI setup ✗ Redis configuration ✗ Runner scripts ✗ Git repository app sync
IANA-reserved IPs
✗ Code activity on GitHub
✗ FAQs on Tender ✗ <meta name="verify-v1" content="aLcMPDDmDYvgufTDcJiiCOgPt/FOooMmGHiPj64BMbU=" />
✗ optimise page load with different asset subdomains ✗ fallback for when 2000 supporters reached ✗ redis-based stat.ampify.it ✗ Update HSBC IBAN/BIC details
✗ Main template ✗ Main css ✗ Main javascript ✗ Main index
hmz
✗ Blog template ✗ Blog index ✗ Blog rss ✗ Blog first entry
✗ Disqus on Blog ✗ Feedburner for Blog
✗ TypeKit
- Devboard (incl. contributor agreement)
✗ Devboard template ✗ Devboard handler ✗ Devboard auth ✗ Devboard GitHub post receive ✗ Devboard breakpad ✗ Devboard tree status ✗ Devboard build status
✗ Devsite template ✗ Devsite index ✗ Devsite article: git review
See the video of an attractive city generator by the folks at ETH's CAAD department. It's an interesting case study of using
http://www.mas.caad.arch.ethz.ch/Exhibition/Helvepolis
✗ Git review ✗ Git review config ✗ Git review style validator ✗ Git review gofmt ✗ Git review revhooks ✗ Git review watchlist ✗ Breakpad
✗ Build slave config ✗ Build slave runner ✗ Python test framework ✗ Python tests ✗ Go test framework ✗ Go tests ✗ Javascript test framework ✗ Javascript tests ✗ Test runner
✗ ./configure ✗ Makefile ✗ git update
✗ Sendgrid subscription ✗ Supporters site sendgrid support ✗ Supporters site mail outs
✗ Trustmaps: update to the .com site ✗ Trustmaps: quick fixes ✗ Trustmaps: docs clean up ✗ Trustmaps: stream ✗ Trustmaps: article for blog ✗ Trustmaps: video
✗ App engine rate limit increase: trustmaps ✗ App engine rate limit increase: supporters ✗ Twitter rate limit increase ✗ Change passwords (twitter, facebook, gmail, etc.)
✗ DNS settings ✗ Apache setup ✗ Possibly get a linode backup + DNS changes ✗ Blog article on ampify ✗ Email espians
bookmarklet assets library game xp points
conflict resolution perspectives
http://marijn.haverbeke.nl/codemirror/
http://docs.jquery.com/QUnit http://ejohn.org/blog/test-swarm-alpha-open/ http://github.com/jelmer/dulwich/blob/master/dulwich/server.py http://github.com/jeresig/testswarm/ http://testswarm.com/ http://testswarm.com/job/2/ http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.rewrite_ports.html http://github.com/lg/murder http://66.102.9.132/search?q=cache:eWcESlDEifAJ:open-content.net/specs/draft-jchapweske-thex-02.html+thex+content&cd=3&hl=en&ct=clnk&client=safari
http://code.google.com/p/google-caja/wiki/CajaCajole http://www.codebasehq.com/features
http://howtonode.org/do-it-fast http://howtonode.org/control-flow-part-ii http://github.com/creationix/do#readme http://ejohn.org/blog/ecmascript-5-objects-and-properties/ http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer
http://standardinterface.org/ http://github.com/280north/aristo
http://allmydata.org/source/tahoe-lafs/trunk/docs/about.html
https://calomel.org/nginx.html
http://api.pyamf.org/ http://fmspy.org/docs/en/userguide.html#features http://sziebert.net/posts/server-side-stream-recording-updated/ http://rtmpy.org/wiki/IrcChannel
http://linux.softpedia.com/get/Programming/Libraries/redis-queue-54702.shtml
http://a.thymer.com/accounts/register/ http://news.ycombinator.com/item?id=180311
http://code.google.com/p/pycopia/
Map this calendar schedule
Planfile IE
magic geoip
http://help.simplegeo.com/faqs/api-documentation/endpoints
bash completion
AmpOS base OS/distro requirements:
- Minimal
- Secure
- Performant
- Auto-updatable system image
- OpenVZ / VServer support
- Optimised to run on either raw metal or inside a VM
- Current, security patched kernel
- Support for both user and dev instances
- Possibly be the base for AmpOS ?
- 32-bit / 64-bit
- ARM port?
standard interfaces
young partially blind regular power user
support infrastructure for exodus day
write-to and write-about permissions
captive portals
live connection — relay social routing
.names for providers
machine learning algorithms — including bayesian classifiers
pecu ranking datastore host/provider + social routing
annotations
rtl
iframe — clickjacking protection support
analytics
opening new dol, choose (sub-)identity







