Author Archive for Andrew Tillman

I Hate You More Then Ever XML

I love Milkman Dan.Okay, so I don’t actually hate XML.

But recently I have been working on writing a syndication tool and I am beginning to agree with a lot of people that question the use of XML for simple data exchange. XML was originally supposed to be both machine and human readable, and in the case of using XML to create structured documents, like XHTML, it is. It was an offshoot of SGML but had much stricter and therefore simpler syntax rules. But then people started to try and use XML for any sort of communication over the network; CSV files got turned into XML (at no real gain other than it’s XML), protocols for method invocation over HTTP (SOAP), to defining the interface for those method invocations (WSDL) and now it seems, for any data exchange out there, a lot of people think that you need to do it in XML, and that you should define the XML via an XSD (XML Schema Definition).  Now XSDs I hate!  In defining the schema of an XML document using XML you are using an crude tool for the task of exchanging data by using a terrible tool for the task of defining a schema. XSD is painful unless you have some sort of tool to to help you.  Don’t believe me, here is the XSD for syndication. Maybe I am crazy but I think that a schema definition language should be human readable and I don’t think XSD is.  The arguments for XML are many, but mostly seem to revolve around it being a standard, and that there are a lot of tools that exist for it.  So XML has evolved from a simplification of SGML for the creation of structured documents, to a catch all hammer in the toolbox of many software designers. Soon people will start suggesting that we just write the programs that run XML based files in some sore of XML based programming language (oh wait, they did that already with XSL and XSLT). There has to be a better way.

Right now I have been looking at other data exchange formats and have been focusing on JSON and YAML. Both are more human readable (YAML even more so than JSON) and have less weight to them than XML for data exchange.   They are standards with decent library support and can cover any structured data format that XML can.  There is even a tool out there to create verifiable schemas for both JSON and YAML called Kwalify. I also am starting to think that there needs to be a language for defining schemas in a language/platform neutral way. This language could be used by tools to generate things like XSD (if you have to use XML), YAML for Kwalify, SQL etc.  This language becomes like a DSL (Domain Specific Language) for defining schemas.  I know there are a lot of people that think that creating a parser for a new language is hard, but using tools like ANTLR and yacc it’s fairly easy and a powerfull addition to your developers toolbox.  As Martin Fowler says, don’t be afraid of creating parsers! We need to start thinking about the proper use of XML as a tool. It has it’s place, but there are better tools out there for doing many of the things that XML is currently used for. Also, is the obsession with using XML for everything preventing us from creating even better tools?  It’s something we need to think about.

PS: Apologies to Max Cannon, and many thanks to folks that helped create Build Your Own Meat!

We Cannot Allow Ourselves To Have a Syndication Tool Gap!

George C. ScottAll the RETS buzz these days seems to be about the new RESO Syndication standard. It promises to make the lives of syndicators (Google, Zillow, Yahoo, Trulia etc), aggregators (ThreeWide, Point2), brokers and even MLSs easier.  With a common data format to use, the workload of everyone will go down significantly.  But right now there is one actor in that list that will be left with a syndication tool gap; the broker.  What is needed is a simple easy to use tool that can allow the broker to create a syndication file reliably, even if they don’t have their own listing database.

Syndicators, aggregators and MLSs are all technology companies that already share data online with various parties.  Moving to a new common standard is fairly easy and once done will help improve their efficiency.  However, many brokers are small operations with little or no IT staff.  What they need is a tool that can run without a database, on a desktop machine. It needs to be able to read from a RETS source, and output a syndication file.  That file can then be uploaded to a syndicator or aggregator, or put up on a web site to be pulled down by same.  At the April RETS meeting in Philadelphia I demoed a proof of concept tool at the RETS Exhibition that did just that, called the RETS Proxy (I even won a prize!).  While this tool is not ready to fill the tool gap, the design I showed has serious limitations, it is basically that idea that I am going for.

With such a tool, brokers can make effective use of the standard.  We at CRT will look into filling this tool gap, I for one plan on taking my proof of concept and expanding it to fill the above requirement.  We also look forward to others stepping up, maybe filling in the gaps that we missed.  The more tools the broker has, the easier it will be for them to use the standard; a standard that would be useless without their data.

Automation Will Set You Free

automator.jpgSome of the most important best practices in software engineering deal with automation; automated builds, automated testing, automated backups etc. It seems that developers are obsessed with automation, but why? Is it because they are lazy? I think there is something to that (and that’s a good thing, laziness being one of the Three Programmer Virtues), but I think that the main reason is safety. Automated processes have fewer errors.

Computers are really good at certain things, and one of them is doing the same task over and over again without variation. If your build process is ten steps long, and automated version will ALWAYS perform those ten steps. It will never forget a step and cause error down the line, and if one of the steps fails for any reason, it will stop and hopefully inform someone.

Because of this, I feel that every developer should be familiar with writing scripts. They can be shell scripts, Perl, Ruby or whatever. They just need to know how to create scripts for turning error prone, slow and tedious manuals processes into fast, easy and error free automated processes. In fact, I would feel nervous about hiring a developer that either didn’t know how, or feel the need to created automated scripts from time to time.

I also have found that this rule of automation is and should be applied to process outside of software engineering. Rules to filter your email are an example of this. The power of automation is what makes macros (properly used) in Microsoft Office and the Automator tool on OS X such important tools. With these tools the average user could, with a little time, automated those painful tasks, saving themselves time and preventing errors.

I will be from time to time providing examples of things I have automated in my own work and of how you can automated common painful tasks that every one seems to deal with. Keep an eye out for it!

Carpe Datum

The new year is here and already it has brought about huge changes! Earlier this month, RESO Chair Michael Wurzer wrote an open letter to Yahoo!, Google, Trulia and Zillow to get them to support RETS as a common data standard for everyone in the industry. I was going write about this topic anyway but a recent press release makes it even more relevant. Today, Yahoo!, Google, Trulia and Zillow as well as several other aggregators have all agreed to work with RESO to adopt common data standards to make it possible for brokers to send a single listings feed to multiple web sites.

This is a big win and opportunity for RETS. With the influx of a large amount of new blood into RESO, The RETS schema has the potential to grow into THE data standard for real estate data sharing. But I also see a danger here for RETS. If the RESO and the RETS community does not seize on this opportunity and run with it, these companies will work together and create something else. The reason they are all willing to work together on this is because there it a huge need, and if RETS is unable to fill that need, something else will. So, lets hope that RESO and the RET community see this opportunity seize the data!

Beware of the Leopard, Part 2

So I have been using Leopard now for a little over a week. All in all the experience has been a good one, but with some niggling issues.

1. Time Machine, I’ve had some issues with TM in the week I’ve been working on it.

- When it works it’s great. But it seems that the first backup of the day slows my machine to a crawl for a long time. After the first backup of the day, all the incremental backups run just fine. I improved this a bit by taking certain files and folder out of TM. The ones that I took out where Parallels VMs, and files that I have under source control.

- I also had to fiddle with the folders TM backed up along with Spotlight. I had Spotlight NOT index my TM folders (I don’t want to see them in the results of searches). The problem with this is that when I want to find a file I need to recover I won’t be able to use Spotlight to help me. What I really want is for Spotlight to be smart about TM backups. I want my TM backups to be index, but to not appear in search results. But I would like to be able to easily search TM backups in Spotlight when needed, but for Spotlight to show the file as a backup and also to show which backup it was found in. In this mode I would also like to be able to recover the file if needed. I guess I want better integration between Spotlight and TimeMachine, which would seem like a no brainer to me.

2. Mail. Faster and more useful then the one in Tiger, but I’ve noticed some stability problems, and the way some of the new features work is not very compelling to me.

- MailActOn Plugin just stops working at times. I need to restart Mail to get it to work.

- Mail sometimes hangs. I am forced to Force Quit Mail and restart. Sometimes, when it hangs, it hangs hard (usually when TM is backing up, so I think they are interferring with each other). When this happens Force Quit doesn’t work, and I have to restart the whole Mac to fix it.

- Notes & Todos. When I got the Leopard version of MailTags, one of the things they took out was Events and ToDos. This bothers me now. The way Notes and ToDo and Events are handled in the new Mail is not as useful as the way MailTags did it. If I store a ToDo on my IMAP server, it puts that ToDo in a Calendar called ‘calendar’ in the group CRT. I don’t want this. I want it to create the ToDo in my Work calendar. As a result I have configured Mail to store all nots and ToDos in my local computer. This allows me to create ToDos from Mail in the correct calendar, but I cannot tie them to a mail message. The notes are OK, but I have yet to use them. It might take me some time to see this feature being useful.

So, after a week I can see where the problems in Leopard are. I still think this upgrade is useful. But like all upgrades, Apple will have to do some work to fix all the kinks.

Beware of the Leopard, Part 1.

So today I joined the Apple masses in upgrading my work Mac to Leopard. I figured I’d give my first day impressions.

1. The upgrade itself was painless. While some people have had problems with the upgrade I fortunately did not run into any.

2. The new Finder has a new look. I like Cover view and the sources sidebar. It seems that Apple is going to unify their look-and-feel around the iTunes model. I will need to work with Leopard more to give a better review.

3. Time Machine got setup with almost no effort. It recogized my USB hard drive and asked me if I wanted to use that drive for time machine. All I had to do was tell it what not to backup. The rest it did on it’s own. I will need to use this over time to see if it holds up.

4. iCal. I like iCal’s new look. I am also looking forward to see how iCal works with CalDAV. The only issue I had was a little trouble with Spanning Sync. Nothing serious, but I did need to upgrade to the latest version.

5. Mail. This is the app that I use the most during the day, and therefore the app I noticed the changes the most.

- All my plugins where deactivated when I started Mail. This was annoying. I use MailTags and Mail Act-On heavily and have come to count on them. However I was able to get both those plugins working again, had I not I would have seriously considered going back to Tiger.

- IMAP seems to be much faster. One of the annoying things with Mail Act-On was that it took some time to finish actions that required a IMAP write back to the server. Now these actions are much faster. It seems that Mail has improved it IMAP support. It even supports IDLE without a plugin!

- Reminders in Mail are nice. Since I use a Ticker file with MailTags due dates I have turned Mail into a little GTD program. The only problem I have is there seems to be no way for me to tell Mail to hide ToDos until they are due today or past due. I also cannot view them at the same time as my Today tickler folder. I’d like to have one view to see everything I need to do.

- RSS in Mail is also nice. While I still plan to keep using Vienna for most of my RSS needs, I do plan on moving some of my RSS feeds to Mail. All of the RETS.org feeds can go in there, and since MailTags works with RSS articles I can incorporate them into my Today folder.

That’s it so far. I have yet to use Spaces, and I really need more time with Time Machine, Finder and Mail to see give a more detailed review. I plan on posting again after Annual, and we can see how well Leopard works on a travel laptop.

The Internet Is A Series Of Pipes

I recently found myself in want of a service that allowed me to aggregate my Google Calendars. In searching around I stumbled across Pipes. Pipes is a service that allows you to aggregate several feeds soruces into a single feed. The sources and output can be RSS, icalendar and other formats. What sets Pipes apart is that you can fiter, sort and transform the items in the feeds before outputing the aggregate so you can create some very powerful custom feeds. The idea behind Pipes is similar to *nix command line tools where you can pipe the output of one tool to the input of another to create some very powerful chains of commands. Pipes a fairly intuitive user interface for wiring the different peices together to create your aggragate feed, you just drag the output of one peice to the input of another.

Chris was able to use Pipes to create a feed that searches blogs and see if NAR is mentioned, to help us find out who is talking about us.

Pipes is still in beta, I for one was unable to aggregate different Google calendar feeds in a way that could be imported into iCal. Also, documentation of how to use some of the pieces is pretty sparse. But Pipes is a tool that has a lot of possiblities, and I look forward to seeing how it develops.

Google vs. the Desktop

Hello all! I’m Andrew Tillman, a new addition to CRT. I’ve been here for a little over a 2 months and Keith told me that to get a Blogger ribbon here at midyear I need to write a blog post. Please bear with me, as this is my first blog post….ever.

The current topic I have on my mind is the debate over desktop applications vs. web-based applications. It seems that as the technologies that have driven the resurgence of internet companies (aka Web 2.0) improves, that web based applications are becoming more and more viable. Google is the most well known of these companies and offers a full range of web-based applications such as Gmail, GCal and many others. It seems that not a day goes by that Google doesn’t create or acquire a new way of doing online what was once done solely by desktop applications. Is the desktop going to evolve into a dumb terminal that solely provides internet access? Or is there another approach that can be taken?

The desktop application has several advantages; offline access, better user interface and integration with other desktop applications. However having all you data on one computer is asking for a hard drive failure and can be annoying when you need to get to you data/applications but don’t have access to your computer. Web applications address a lot of these short comings; they can be used anywhere, you don’t need to worry about hard drive failure etc, but you are hosed of you don’t have internet access. There is also the concern of having your data solely in the hands of another company.

I think in the end I prefer what I call a hybrid solutions. Hybrid solutions are ones that store the data on a server on the internet, but allow access to it through both desktop applications and a web interfaces. I get the advantages of a desktop applications; offline access, better interface and integration with other applications. I also get the advantages of an web application; access from anywhere computer, data sharing between multiple computers and easy recovery of data after a hard drive failure. Also, since the data is stored locally as well as remotely I still have my data available to me if my service provider experiences a catastrophic failure. The biggest drawbacks to a hybrid solution is that it is more complex and if you have privacy concerns about someone like Google having access to your data they are still not addressed by the hybrid solution.

To that end I use Fastmail for my personal email, they have IMAP access as well as a very good web mail client. I use Spanning Sync to sync my Google calendar with my Mac’s iCal. I use del.icio.us even though it doesn’t quite qualify, however if I cannot get to the internet, cannot use my bookmarks anyway. I am currently looking for a similar server for my address book as well as one for iPhoto, and iTunes (the size of the files involved make this more problematic). I have also started to look into getting the Vienna news reader integrate with Google Reader.

How do you address this issue? Are you solely desktop, all web apps or somewhere in between?