Ideas

Fantastic ideas

I said that the next article would look at Simon Phipp’s ideas about the open source cycle in more detail, but instead there’s a small interlude because last night I heard Steve Cook from Microsoft talk at one of a series of British Computer Society Software Practice Advancement meetings in Cambridge.

Steve is one of the software architects working on Visual Studio. His work concerns Software Factories and his talk, which I thought was spectacularly good, was titled ‘Software Factories: Assembling Applications with Patterns, Models, Frameworks and Tools’.

Like all good talks his was absolutely jam-packed with ideas. A sort of brain-dump. He has obviously been thinking a great deal about the area for years, and it showed. Unfortunately I was so captivated by what he was saying that I didn’t keep good notes, so what follows is from memory. At some point I may have the slides from the talk, and I’ll update this. In the meantime, Steve, I apologize if I’ve mis-represented anything you said or missed out some important points.

He started with the thesis that software development needs to (and is) moving away from artisanship towards something better. He spoke a bit about the phases that other industries have moved through (artisanship, mass production, continuous improvement and mass customisation) and tried to draw parallels with the software industry. It’s an idea I agree with in part, although I thought the analogies he used were misleading; the software industry spends most of its time and effort in the product development phase of a product lifecycle, in contrast to most other industries where production is the part that takes most effort. Despite this, I agree that there’s a clear trend towards using higher and higher abstractions in order to allow us to accomplish more. And Steve’s ideas are all about the ways in which we can accomplish this; through Software Factories.

Next he looked at what a Software Factory has to accomplish. He presented a convincing case that in order to effectively use knowledge it had to be readily available at the point of application. Many (possibly most) re-use programs fail because finding and re-using the knowledge is harder than rebuilding the component, for instance. Corporate standards often fail because the the group mandating the standard has no power or mechanism by which to lever the standard at the time and in the place when the standard would be beneficial; the first thing that developers inside corporations when presented with a new project do is not to open their book of corporate rules. They open an editor. Steve convincingly showed that for re-usable knowledge to be successful in practice, tools that understood the context of your work were necessary. You only want GUI design guidelines to be on hand when you are designing a user interface. You only want coding guidelines when you are writing code. Your viewpoint provides the context.

Next he looked at how we could provide knowledge, and here we have to take a little leap-of-faith: Steve thinks that little domain-specific languages are the way that we can codify and re-apply knowledge. He gave an example of a domain specific language which modelled phone billing. He showed a model for the process, and demonstrated conceptually how a domain specific language could be used to capture the points of variability between several product lines, for instance. To produce each application, you would vary the input to your domain-specific language model, use this to generate code, and the code would plug into the framework for your application, the framework being the part that doesn’t vary. For the phone example, using the domain language you could generate many billing systems – each of which would be suitable for type of customer using a particular phone payment plan.

Of course you can use more than one domain specific language in the same product line, each controlling a particular aspect of the product line.

In the last part of his talk, Steve looked at a tool that his group has built to help with all of this. They have an add-on to Visual Studio which helps you to design domain specific languages and matched code generator, suitable to your application. One lovely feature of their tool is that it’s the embodiment of a domain specific language used to describe domain specific languages; they have bootstrapped themselves.

In the closing minutes of his talk, Steve walked through some examples of their tool in action. I think that one of the examples was that if you worked in a bank you might use this tool to capture the knowledge of the variability between account types. This would allow people building applications to embed this knowledge in their work; building applications which automatically correctly manipulated the state of a bank account, and knowing with more certainty that the operation was correct because the code was generated using a proven domain language for describing bank accounts and transactions.

If you use Visual Studio, you can download a beta of their tool here. There’s also wealth of other information about the ideas.

I’ve seen the tools listed on the VS site, and before now and had a quick look. I wrote them off as uninteresting. I was wrong. I misunderstood the potential power that they have to help companies with particular types of projects.

I came away from the talk quite elated; the ideas seem to me to be one (meta) step beyond object libraries, and potentially very useful for certain classes of applications. I think that they represent an important step towards being able to automate the repetitive parts of software development, and an excellent way to allow you to capture knowledge, and reuse it.

In the ending moments of Steve’s talk, I was struck by the smartness of the ideas. It occurred to me that this is the sort of thing that open source doesn’t often produce, but which it should. I think that the reasons have to do with economics. More about this next time…

A long time ago, I got the chance to go to the Grand Canyon. It’s pretty amazing, I don’t really have the words to describe it adequately. It’s just amazing…at points it’s 10 miles from edge to edge, with a near vertical drop-off to the bottom. It’s huge. But at some places you can look across to the horizon and see flat land stretching outwards, then when you take a few more steps and see that there is an enormous chasm between you and the edge of the world.

Software seems like that to me at the moment.

We’ve discovered the power of the Internet, and being connected. We’ve discovered that we can devise new applications for groups of people, that we can aggregate information for something larger than the individual bits. We’ve found out that being able to access information from anywhere is hugely useful.

But our desktops and the Internet are two completely separate software worlds.

If we build a web-based system then it’s accessed through a browser and can’t interact with our desktop.

If we build a traditional bit of software then it has to be installed to be used, and it will be subject to platform quirks and problems with particular configurations.

There’s a chasm between the two worlds.

AJAX and Web 2.0 is better, but at the root it’s an attempt to make the browser behave like the desktop. And it’s subject to it’s own problems. Nevertheless, I think that this is a step towards the solution.

Here’s what I want:

  • I want to be able to make applications that are client-server in nature;
  • They should expose a web interface for when I’m not at my PC;
  • They should have a federated data-store behind them; that is, there is some data I don’t want to expose to the Internet but the data that is exposed to the net should be part of a bigger whole, rather than a copy. Alternatively if the data exposed to the net is a copy, then syncronisation should be 99.9% automatic;
  • I want to be able to run any application without being connected to the Internet, maybe with reduced functionality;
  • I want my applications to enjoy all the benefits of normal desktop interaction; the granularity of actions should be small; I don’t want to think of the world in pages, and I do want to be able to drag and drop;
  • I want to be able to try applications without an install. I’m willing to install if I decide that something is useful.
  • I want security; I don’t want my PC open to crackers.

Mostly, this vision is already achievable. (Microsoft is calling this sort of app a smart client.) But it takes hard work because you have to build the infrastructure by hand. In fact, most of the infrastructure is re-usable and should be abstracted into a tool and framework.

Now there are two or more environments that are partly suitable:

  • Java. Can run on the desktop. Can run on the server. Can run on multiple platforms. Can deliver responsive apps – witness Eclipse. But there’s no easy way to build dual-headed apps that I know of. Do you know of a way? There’s the ability to run from the web with Webstart.
  • .Net. Can run on many versions of Windows. Runs on client. Runs on server. Has good integration. Can build dual-headed apps. But it’s not great for delivering software if you don’t want the source to be readable. Granted you can get an obfuscator, but a industrial-strength obfuscator should be part of the basic package. A big downside: you need the giant 20MB runtimes to be on the client machine. There is no linker available as part of the MS package which would reduce the size of most apps and allow you to ship a single file, but you can buy one from Remotesoft. And lastly, .Net isn’t big on platforms other than Windows.

I’m sure that there are other ways of doing this too; for instance, if you install the PHP and GtK runtime on the client then you could use PHP locally and remotely. But other solutions are less integrated.

Overall, I don’t think that either of these solutions are great. It seems like it’s going to be write-n-times for a while.

I keep a ginormous todo document cum notebook cum scratchpad. Over time it’s been in different formats. It’s been a paper document, a text file, a Word document (that was a failure), currently it’s in an app called KeyNote.

I like KeyNote for a bunch of reasons:

  • It’s open source. Not very active, but if I ever want to get involved I could.
  • It sits in my system tray and is always available
  • It instantly saves changes. I don’t need to remember to press Save.
  • It saves all the notes in an open format, with all the notes in one file.

But I’ve got some annoyances and they are growing.

  • I want my notebook available all the time, wherever I am. If there’s a computer I should be able to use it. More about this below.
  • I want a much more dynamic categorisation system. KeyNote uses a tree and that’s good, it can make links. I like the wiki thing that links happen automagically. And I like to be able to tag to make links.
  • I’d like to be able to make portions available to other people.

Available all the time, everywhere. I want a system that works as a local application, but has a web repository and is useable via a web interface. The web interface must be slick. If there’s no internet connection then the software should continue to work. And it should sync up later.

(Actually there’s a general problem here: I want lots of software that has a web interface to the data and a local version which work on common data. It’s a problem that I keep thinking I should address, but there’s not much money in infrastructure. On the other hand maybe the solution could be embedded in a general tool like Dreamweaver embedded web design.)

I found EverNote last week, and I’m trying it out to see if it’s a viable replacement. It pretends it’s a never-ending roll of paper. You write new notes at the bottom, and group them by classifying them with tags.

It’s certainly got some of the features I want and there are lots of good things to recommend it. The basic version is free! (Not the code though.) It’s easy to add stuff. It’s pretty easy to tag. It sits in my system tray. It may be suitable. Sometime in the future it’s going to be able to sync, which suggests that it will be using a repository.

On the other hand, I’m finding the interface a finicky. And not pleasing on my eye. Maybe it’s because I don’t yet know how to use it. But it should be discoverable. Maybe they need a graphic designer on the team.

I had hopes that blogware would do the trick, but it’s not because the focus is on publishing. At least it’s not yet. Maybe things will change. 

So, some market research: If I build the software, is there a market? How many of you are looking for this sort of thing? If you read this, and you think it would be interesting, send me a mail at notebook at tanasity.com. Don’t worry if you are reading this a couple of years after it was written. I don’t expect to be writing the program tomorrow. Tell me what features you yearn for.  

Search engines. I wonder if they’ve incorporated the idea of information decay rates in the relevance rankings.

Frinstance – if I search on Columbia, there are many possibilities. Here are a few:

  • The Columbia space shuttle
  • The Columbia river
  • Columbia records
  • A song by the band Oasis

If had searched for Columbia immediately after the disaster, then it would have been most likely that I was interested in the disintegration, and it would have been good for references to the shuttle to be near the top of a search results list.

Some years later, the disaster is no longer of current interest, so it should no longer necessarily be near the top of the results.

I posit that bits of information have decay rates. The key facts about the Columbia river have a very slow decay rate; the river isn’t going away anytime soon. New facts may arise – for instance, if there was a toxic spill in the river – which would have different decay rates.

Results should be ranked according the relevance, where the relevance is calculated according to where the info is found along the decay rate curve. This might be a better algorithm than looking at links alone.

It’s not an easy job. I wonder if this already happens. Do tell me.

 

One of the (many) things that bug me about software is the disconnect between design, documentation and the software itself. Some developers say that the source code is the documentation – and they’re right – the source code for a program tells you what it does in excrutiating detail. But source code usually doesn’t tell you what the program should be doing, or what the design decisions were when the software was being developed, or the ways that the software has been extended or the background to the problem. Most of these things are held in other documents or peoples’ heads.

That’s very annoying.

The documents get out of date as the program overtakes them. They get lost in the pile. They lose their relevance.

It’s not a new problem. Twenty years ago Donald Knuth came up with Literate Programming. His idea was that you should build the program as part of a narrative and the tools should understand which parts of the document are the program itself and which parts are the documentation. Actually, that’s the ideal, but you can use existing tools with only a little work, by writing a literate program, and then extracting the code and pushing through your normal tools. In Literate Programming this is called tangling.

He and his students built an excellent system, but it has never caught on widely. There are a number of likely reasons:

  • Literate programming introduces a new layer of development and requires discipline
  • IDE’s don’t support the style and the benefits in using an IDE outweigh the disadvantage of not having Literate Programming.
  • It’s not already a standard
  • There aren’t tools that support lots of programming languages, while many projects require the integration of several toolsets
  • The tools that do exist are often rudimentary
  • There’s a lack of comercial support

In the meantime XML has grown up. Now a while back, some smart people realised that a document based SGML or XML was a pretty good way to hold the elements of a Literate Program, the prose and the code. In fact some people went further and realised that it would be quite good if you could include a range of communication elements. Text, graphs, pictures and more

And now we come to OpenDocument. It’s got the markup for just about everything you need in order to create a workable Literate Programming environment, and it has the bonus that if you handled the document through a database-style manager, then you could have best-of-breed software to work on each part; you could write the code in your favourite IDE, the text in your favourite word processor, the diagrams with the most suitable diagram editor.

And everything would still be held in one place, accessible by many tools. With the added benefit that members of a team would all be able to work on the document at the same time.

Anyone interested in the details of what I think needs to go into to this to turn out a product?

 

PhishFighting.com is an excellent idea. It’s aimed at destroying the data that phishers collect by flooding it with false data. 

Whenever you get a phishing mail you copy the URL, and paste it into the PhishFighting form. Then PhishFighting goes to work, posting a new dummy login every 20 seconds or so, for as long as you like.

It’s not a Denial of Service attack if the page is hosted on some unfortunate person’s trojaned computer; it just taints the value of the data collected by phishers, hopefully making their life much, much harder.

Here’s an idea for a great enhancement: wouldn’t it be great if you could get ‘tripwire’ accounts from eBay and banks. Banks would let you generate fake account details – a username and password. You could then feed these to phishers, and when the phishers used a one of these fake tripwire accounts, the business would immediately know that a particular computer had been compromised or was the computer used by someone attempting fraud.

How could this be turned into a business? Perhaps by aggregating the data? If you provided a service to banks; as they got tripwre attempts, they’d send you the details, and you would share the list of all computers with a tripwire against them with the bank.

Potential problem: dial up users have IP address that change regularly, so you couldn’t blacklist indefinately. Otherwise someone  dialing in could find themselves blocked by their bank. Banks and other businesses would have to respond with notices that said that if you were using dial-in, to redial

If you read this and decide to turn it into a service, do keep me in mind when you trade-sale or IPO. 

Patching Word

Because Microsoft have said that they have no plans to add OpenDocument format abilities to Word, there’s a huge opportunity to build a fix for Word that allows it to open, edit and save in OpenDocument format.

I’ve written bits of code in Word from time to time, it’s possible that this could be done from inside Word, but it’s not likely. However, most places it’s legal to reverse engineer for interoperability, so it’s probably feasible to write the code so that it patches Word. In due course Microsoft will have to interoperate with OpenDocument, so there’s a chance that they would buy the company.