Generating sitemaps with Rails
Introduction
Sitemaps are XML files sitting on your web server that are used to give hints to search engines about the existence of new pages, how often pages change and the priority a search engine should give to each page.
The format is very simple to understand and complete documentation is
available on sitemaps.org. There is a root urlset
element which contains
multiple url
elements. Each of these url
elements must contain the loc
element which specifies the location of that URL. It can optionally contain
lastmod
, the date and time at which the page was last modified, changefreq
,
how often you expect the page to change and priority
, the priority of that
page compared to others on your website.
Implementing sitemaps in Rails
Rails makes it very easy to generate your own sitemap.xml
file dynamically.
The sitemap can be implemented using a simple controller to generate a list of
pages and a view that use Builder to generate the XML output.
The first step is to create the route in the config/routes.rb
file, which
will look like the this:
get 'sitemap', :to => 'sitemap#show'
This route simply routes GET requests for sitemap.xml
, to the
SitemapController
’s show
method. You do not need to worry about specifying
.xml
in the route, Rails will automatically figure it out and include the
correct view.
The next step is to create the controller in the
app/controllers/sitemap_controller.rb
file. This needs to fetch information
about all of your pages from the database or disk or wherever you are storing
them, so the code will vary. In this example I’ll pretend that I’ve got a
Photo
model and a page for each photo, along with a Link
model used by a
single page of links.
class SitemapController < ApplicationController
def show
# grab info about all the photos since they each have their own page
@photos = Photo.all
# grab info about the most recently-updated link as they share a page
@link = Link.first :order => 'update_at desc'
end
end
This is fairly simple code that should be easily understood by anyone with a basic understanding of ActiveRecord.
The final part is to create a view called app/views/sitemap/show.xml.builder
.
The name signifies that this is a view for the XML format (as I said earlier,
Rails uses this to automatically detect the view to use) using Builder as a
template engine.
Builder uses a domain-specific language implemented on top of Ruby to allow you to easily create XML documents.
In general, your calls to Builder look like this:
xml.tag_name :attribute => 'value of attribute' do
# nest tags here, can include ruby code to do loops, etc.
end
Here is the code in the view that we can use to produce the sitemap:
# this produces the <?xml ... ?> tag at the start of the document
# note: this is different to calling builder normally as the <?xml?> tag
# is very different to how you'd write a normal tag!
xml.instruct! :xml, :version => '1.0', :encoding => 'UTF-8'
# create the urlset
xml.urlset :xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9' do
# photo pages
@photos.each do |photo|
xml.url do # create the url entry, with the specified location and date
xml.loc photo_url(photo)
xml.lastmod photo.updated_at.strftime('%Y-%m-%d')
end
end
# links page
xml.url do
xml.loc links_url
xml.lastmod @link.updated_at.strftime('%Y-%m-%d')
end
end
We’re almost done now. First I’d fire up WEBrick and test to see if it works
fine and fix any issues. The last step is to create or modify your robots.txt
file to specify the location of the sitemap.xml
file.
To do this, simply add a line like the following to the bottom of your
public/robots.txt
file:
Sitemap: http://example.com/sitemap.xml
where example.com
is your domain name.
Caching your sitemap
If you’re running a small site then you probably don’t need to worry about this. However, on a large site each request to your sitemap could end up pulling a large amount of data from your database so you may wish to cache it to speed things up.
On my sitemap I use page caching, to enable this you simply need to add the following line to the top of your controller:
caches_page :show
This will make Rails save the page to the disk when it is generated, so Rails isn’t even involved when a client requests the page for the second time.
However, the final thing you need to do is to expire the sitemap when something changes. To do this you need to simply add the following snippet of code:
expire_page :controller => :sitemap, :action => :show
You might also want to look into sweepers to avoid copying and pasting that snippet of code around everywhere in a complex application, but if you’re just running a simple blog or personal site the code above will probably be sufficient.
Submitting your sitemap to search engines
Once you’ve done, you’ll probably want to submit your sitemap to search engines. Generally this happens automatically, but some of them provide tools to see how often they look at your sitemap or if there are any problems with it.
For Google, you can do this with the webmaster tools. The official sitemaps website also has more information about this.