Posts Tagged informationscience
When to Pause, When to Push
Posted by admin in entrepreneurs on 12Feb09

It’s now 11pm on Wednesday night. Tomorrow morning, at 10am, I will be presenting my Project Plan to execute $6M worth of custom software development over the next 36 months.
That Project Plan doesn’t really exist yet.
It’s been a busy week. LAST night, at 11pm (roughly), I filed a Notice Of Intent, to bid on a DIFFERENT multi-million dollar, multi-year contract. Oh, yesterday was also my oldest daughter’s 6-year-old birthday.
There’s a point, in here, somewhere. We’ll wind our way towards it.
Technically, these days I’m an “Information Worker”. What I think that means, is that I get paid for thinking about things. At least, that’s how I choose to interpret it. My clients probably prefer to think I get paid for the OUTPUT of my thinking – but I’m all too keenly aware of how directly the quality of my output, is related to the quality of my thinking. Read the rest of this entry »
<N> Reasons Why Open Standards, more than Open Source, Really Matter
Posted by admin in entrepreneurs on 10Feb09
There have been some great articles on the dangers of, either no standards, or closed standards. However, no one has really talked about how almost EVERYTHING we have accomplished as a race of people, has been to the credit of open standards of information exchange and interface. So let’s take a walk back through the ages, and look at the wonderful things that open standards have brought us.
1. Numbers
Regardless of the language they use, or even the character set they use for writing it, most countries on the planet now use the “decimal positional notation” for all numbers and mathematics. This public, open standard for notation has allowed the development of relatively friction-free international commerce, and was the successful basis for… Read the rest of this entry »
Email address as OpenID
Just a strange thought I had this afternoon, based mostly off of a report I’ve been drafting (on the market opportunities of demographically-specific technology addictions), which highlights some interesting points:
Everyone on the internet uses email.
I’ve played around with http://identitu.de, which takes your Facebook account, and turns it into an OpenID. Which is wickedly cool – if you have a Facebook account.
But what if all you have, is email? Could we take email accounts, and verify them (using IMAP or POP3), and present THAT as your digital identity?
I think we can. Watch this space for my attempt.
RDF – The Graph, the Truth (value) and the (sql) Lite
Posted by admin in entrepreneurs on 04Aug07
At the heart of Web2.0, and indeed the new digital Age, is data. Lots and lots of it.
Much of this data is data about OTHER data – metadata.
Turning data into information involves being able to make connections and inferences between disparate BITS of data, recognizing and analyzing emergent patterns.
Keep all of this in mind on the ride ahead…
Many years ago, the good folks at Mozilla (which, in the days before it became the organization which could build anything, was known as Netscape) had to make some choices about the data that lives inside a web browser. Roughly speaking, these choices are:
* How should we store it?
* What structure should we model it after?
* What interface (or metaphor) should it present to the (platform) rest of the world?
The structure they chose, was a directed graph. And the interface, RDF – a simple reflection of that graph. Finally, they chose to store it in a variety of ways – as XML files, in the general case, as HTML files in the case of bookmarks, as a hash map in memory, and, occasionally, in Mork.
The reasons for these decisions are, largely, lost in time. (Mork, especially, seems to be entirely lost as well – although I have the source code, I have no idea how it works or what it looks like internally. Fortunately, I don’t have to.) And many of today’s coders second-guess these choices, or reject them without review. But recently we have been faced with the same questions at Flock – what structure, what interface, and what storage mechanism? To talk about how we’ve answered that, let’s talk about the data again.
Flock is a browser about people. And when you talk about people and metadata in the same conversation, things can get very busy very very fast. “Jonnie wrote a new blog post.” “Sally commented on Jonnie’s blog post.” “Billy took a picture of Sally, and posted it on Flickr.” “Billy, Sally and Jonnie are now friends.”
One pattern immediately jumps out – lots of data. Lots of unique events. But very few unique ITEMS. A small number of people (hundreds, maybe thousands), doing a small number of things (friending, commenting, blogging, photoing), in a small number of ways. All of it, related to itself. Graph, anyone?
So mozilla’s structure seems like a good fit. Next we come to the interface. Ah, RDF. That great, evil, monstrosity. Kill it, bury it. Put it out of it’s misery. “It’s complicated,” they say. “It’s legacy, archaeic,” they whine. “What is it good for?”
And that’s a funny question. Because, really, if you do a quick google search for RDF -mozilla, you’ll find that the only major thing it’s been used for is… FOAF. Friends of a friend. People data. Huh.
Simple factors to recommend RDF:
1. Directed graph is a good fit for the data. RDF is a good semantic model of a directed graph.
2. RDF code is FAST.
Now, immediately folks will start taking issue with this. “Fast, you say?” “Look, it takes a full 3 SECONDS to write to disk!” “Look at how it hangs the browser when we’re trying to add things to the RDF!”
Let’s back up a minute and talk about storage. Because here, I believe, is where Mozilla made an understandable mistake. You see, with Mork holding browser history, and HTML holding the bookmarks, the only thing left in RDF was the localstore, a simple collection of miscellaneous UI-related facts that were serialized only on shutdown. If they were lost in the event of a crash, no big deal. So the XML serialization code was slow, who cares, right?
Not so.
Skipping ahead a bit, let’s look at what Flock has added to the equation, and why I think it matters.
Step 1: RDF is a bitch to use.
It’s true – too many interfaces, too many services, ASSERTING and UNASSERTING is a totally new grammar for most folks. Enter Ian McKellan, stage left, the author of Coop.js. (Don’t confuse this with Mozilla’s The Coop, which came out a year later and is a totally different beast).
So what is it? Think ActiveRecord, for the directed graph. It’s a javascript ORM (object-relational-model) that makes it easy to read and write from an RDF datasource, with a surprisingly small overhead. (Ian will be surprised by that last part, but we’ve done a few things to coop since he last saw it.)
Great, now I can read and write to the graph just like getting and setting properties of a javascript object. But what about the SPEED?
Step 2: Get rid of the XML.
As part of his coop efforts, Ian had prototyped a SQL-backed RDF implementation, hoping to use SQL statements directly to work around some of the more expensive computations against a traditional graph (such as SUMS and COUNTS). We (Bruno, actually) ported that to C++ for speed, finished it off, and glued an In-Memory HashMap to the side of it as a cache. Voila – now every change is written out immediately, there’s no periodic 3-4 second freeze while the entire graph is serialized, and we’ve got the framework upon which to hang further performance improvements via direct SQL query.
But there’s still something missing.
Step 3: Split it up.
Take another look at the data we’re dealing with, here. People data, hmmm. Much of this data, like browser sessions, is transient. It really shouldn’t get written to disk at all. Supposing we had a separate datasource, purely in memory, into which we could stuff all this TRANSIENT data – where it could magically go away when the browser shuts down? Supposing that we could somehow COMBINE these datasources so that, to the UI layer, they would appear as a single, COMPOSITE datasource?
Those of you who know RDF, know that I’m playing a bit tongue-in-cheek at the moment, since one of the beautiful parts of RDF is its ability to COMPOSITE various datasources together (although there were a half-dozen bugs in the mozilla implementation of this that we had to iron out first).
Step 4: Watch it carefully.
A slight digression here for the particularly geeky – the new-and-improved observer. (Yet another Ian invention, executed this time by Mr. Yosh). One of the most important parts of data-driven UI is that it should respond dynamically to changes in the data, and while the RDF Templates system does this quite well, there are cases where it’s not the right tool for the job. (Such as when you only want to show the first ‘n’ items of a list, or when you need pagination. A mammoth oversight requiring an equally herculean effort to resolve. Template code is not for the faint of heart.) The traditional approach to this was a simple nsIRDFObserver – with a catastrophic side effect. Calling into arbitrary javascript for EVERY CHANGE in the RDF can become prohibitively expensive almost immediately, and in almost every case – you only care about a small subset of the changes that are occuring. The ArcObserver allows you to specify which patterns your code is interested in, and receive notifications only of RDF events matching that pattern.
Step 5: Profit?
There are still bugs, of course. (Check out the flock bugzilla site, and do a search for RDF). And vast opportunities for optimization. But here’s the state of the union:
1. We have a rich, well-matched data model.
2. We have a convenient set of tools for changing (coop) and observing (ArcObserver) that model.
3. We have rapid (Templates) and sophisticated (coop + E4X) ways of driving UI from that model.
4. We have a storage mechanism that is flexible (through compositing, we can use ANY datastorage mechanism we’d like), and FAST (the current sqlite RDF datasource will accept 1000 asserts per second, and we expect to double that number with planned improvements).
So in the end, maybe Mozilla was right the first time. I’d like to think that Flock is proving out the promise of RDF – in a world of data and metadata, where much of it is homogenous or intimately related, there is yet another truth – some of it never is. Some service, or some person, will want to store a fact about themselves or their relationships that no other service or person employs.
How would you cram that into a relational table structure?
How would you try and derive value from it?
Would it be any better than a graph?
Next up – best practices for Templates and Bindings, or, how to make the UI extensible as part of your open API.
Blogged with Flock
