Back on my feet and ready to sail the seas of trapped information, ya scurvy dogs!

Ahoy mateys, so that “moose fever” – turned into pneumonia for me! On top of which my entire family got sick too. But we’re finally over that now, so time to break the silence and set sail on the seas of end-user mashups.

As much as I felt some small discouragement with the NV mashups workshop because of certain technologies blowing up during the session and us not sticking with a more hands-on format, I have not given up on the dream of exploring mashups for non-programmers and have continued on, scratching a few of my personal itches.

Job Postings Want to be Free!

The first one, which I mentioned in that session, is around HR departments that don’t provide RSS feeds for their job postings. I’m sure some smart HR professional out there will clue me in to how this is intentional and keeps the riff-raff out, but from where I’m sitting I would love to be able to help people who are already interested in my organization to easily monitor new openings, especially in this tight job market with more jobs than potential employees.

But if you are in higher ed in B.C., my particular niche, you are out of luck unless you want a job at UBC, the only institution that so far seems to have grokked this. I’ve found ways, over the years, to get this information pushed to my email (the WatchThisPage service has been particularly useful in this regard,) but I mean, email, ick, that’s so 1994 (whoops, a real pirate would never have said ‘ick’). So my goal was to see if I could build an aggregated page of BC post-secondary job postings, one with an RSS feed too.

There’s basically 4 steps in the process:

1. Identify the Pages

This one’s easy – go to the various institution sites in the province, and locate their job postings pages. In this experiment I used the job postings pages from schools local to me, Royal Roads University, Camosun College, University of Victoria (and UBC’s because they were already RSS).

2. Scrape the pages

So the problem with all these pages – no RSS feeds. That’s where a service like Dapper comes in. Dapper offers a fairly simple way, though the use of a ‘virtual browser,’ to look at a web page and tell it which elements on the page you would like to scrape out as data. It then allows you to access this scrapped information as XML, HTML, RSS, CSV, JSON, as a Netvibes module or Google gadget, and more. An example is this dapp that scrapes the UVic jobs postings page.

3. Clean up the feeds

Now you’ll notice in the UVic job posting example, there is all sorts of cruft in the feed. Dapper is ultimately only as good as the page that it is scrapping. It does its best to identify logical groupings based on the page markup, and the more that XHTML has been used in a logical way the better it does, but HTML itself isn’t a logical markup language. Dapper does offer you the ability to tweak the scrapper with constraints, but this is one aspect of Dapper I find not to be overly intuitive.

So instead of trying to clean the feeds up in Dapper, I take the crufty feeds from Dapper into Yahoo Pipes, which offers a much easier way to clean up the feeds. In the case of the UVic feeds, it’s by creating a filter to allow only those items whose feed contains the text “Comp,” which turns out to be the common element in all of their postings. Here’s the pipe, which if you clone you can see the various feeds being cleaned up.

4. Aggregate all of the new feeds

This turns out to be simply once these other steps have been taken. There are lots of feed aggregation services out these, but since we already brought all of the feeds into Pipes to clean them up, it’s easy to just use the ‘union’ function there to join them into one master feed of job postings.

About the result, and why am I talking like a pirate?

So obviously the resulting feed above only contains 4 of the 26 institutions in BC. It’s really just rinse and repeat to get the rest and then some formatting cleanup, which I purposely didn’t do (and least not publicly, hehe).

My intent in documenting this exercise (and the next one) was not to provide a production-ready feed of all BC post-secondary job postings, handy as that might be. It was instead to

  • illustrate how YOU can use tools like Dapper and Yahoo pipes to create feeds and aggregations for data on almost any webpage, (made seemingly even easier now with Dapper’s release of the DapperFox plugin)
  • spur information providers on to doing it right the first time – there is NO reason (as we will see in the next example too) to ever provide another list, another calendar, another set of links, etc, in a way that by default traps the content in a single presentation, only ever editable by a single author. NO REASON, and lots of GOOD reasons not to. The separation of content and presentation should have already become one of the default criteria you use to select any technology. If the tools you are using don’t support RSS or some other means to do this, use one of the HUNDREDS of FREE ones that do. And at the very least, please adopt tools that produce proper XHTML – accessibility means providing access, and if you won’t do it to cater to web wonks like me, do it at least to serve people who have no other choice but to consume your page through a text reader or other assistive device. If you don’t, someday someone may make you.

RSS_eyepatchAnd the pirate metaphors? Well certain, shall we say, ‘issues’ around intellectual property were pointed out to me during the NV mashups workshop, and I guess this is kind of my reply – if you aren’t going to provide the data for users in a way that enables them to use it how THEY want to, don’t be surprised when they go and do it themselves, arrgghhh.

So until our next swashbuckling adventure, I remain yours truly, Cabin Boy Nessman of the good ship Syndication.

Back on my feet and ready to sail the seas of trapped information, ya scurvy dogs!

7 thoughts on “Back on my feet and ready to sail the seas of trapped information, ya scurvy dogs!

  1. I work for the institute, and I didn’t even notice… Time to aggregate the job posting page… A note to the administrators, if you can’t do RSS, at the least search engine friendly urls. Glad you are recovered, Scott.


  2. Pneumonia! I hope yer days livin’ as a bilge rat have finally sailed t’ seas o’ history.

    I do hope your frustration has also passed. The tool failure was not your fault, and cutting the session short just showed awareness of the vibe of people’s energy… people were ready to graze, not binge on any one subject.

    Thanks for the pointer to DapperFox, and for a killer rant. If any scurvy-assed scallywag gives you the IP jig, I say we run ’em through with a rusty scabbard!


  3. […] 看起来很明显,确实也需要被提到,因为教育社区里的累积智慧在这点上很无助,似乎也无法将这个事儿说清楚。象活动和工作清单这类事情(Scott Leslie在前一个帖子中提及)可以很容易被聚合。只需要做一些工作,将这些资源收集起来,制作一个用户可以订阅的RSS源即可。这非常象我制作Edu_RSS(我真的必须重新把它做起来)的方式。确实是这样,如果教育机构切实提供帮助,比如制作”工作RSS源”和”活动RSS源”,事情将会容易很多 。但正如我所说,只是在那儿坐坐很容易,但却于事无补。我会很快在这方面做更多工作的。 Scott Leslie, EdTechPost March 14, 2007 [原文链接] [参与评论] […]


  4. I am glad you are back on your feet! I was back at work for only a few days before succumbing again and have been at home and in mostly in bed for a full week. I’m still only about 60%, but I think I see light at the end of the tunnel… and I am GLAD, because I have a lot of things I want to do/resume/start after all the inspiration at NV.


  5. Hey Scott, I don’t know if this is the place to ask you this, but I’m thinking of changing jobs, something easy, something I can manage with my present minimal computer skills.

    I don’t want to have to start by paying money to my new employer. And I can’t see me doing anything for less than a couple grand per month.

    I dunno know, maybe something in Logistics? I’ve always wanted to work in Logistics, or at least to know what Logistics is…

    I can’t relocate, but would love to do some part-time work in the States, Germany or Australia — and you know how much I’ve always wanted to visit Spain.

    Just a longshot — but let me know if you hear anything, OK?


  6. Hey Brian, funny stuff! Sorry about the commment spam – Askismet does a fantastic job, but for reasons still unknown to me this post (and 2 or 3 other ones) seem to let through the same damn “Logisitics job offer’ spam. I will look into it, but my apologies.


Comments are closed.