I trust Google. I really do. At least more than I trust any other company that I deal with regularly.
At the same time, I’ve been using computers long enough that there is only one kind of backup I trust completely: a backup that I control on local medium.
So, in light of that, I have a python script to help create backups of blogs on blogger.
The script, blogger_backup.py is available on my webpage under the GPL v2.0.
This is a fairly primitive script, but it does have some nice features (especially the fact that it works well unsupervised as a cronjob). There are two main requirements. First, you must have python installed. Any halfway reasonable UNIX-like system will have it, and it exists for Windows as well. Second, you must set the feeds in the blogger dashboard to ‘full’.
Once those two things are taken care of, just run the script followed by the name of the blog. In my case, it would be:
./blogger-backup.py netpurgatory
This gets an xml file with the 100 most recent posts and the 100 most recent comments. (Kayhan pointed out that contrary to what I thought, you can only grab 100 of each thing, not 1000 with this script. Hopefully I can find some way around it before moving to using the Google API.) It cannot get more than that or get photos that are up (probably in a picasa album). I hope to fix those by moving from a simple python script to the Google API for blogger and picasa. Those provide much more powerful features (but they require installed libraries) and should allow for a more complete backup. For the moment however, my script will do.
I’ve got something in the works to backup gmail accounts as well (using the libgmail libraries), but that will have to wait a bit.
3 Comments
It seems that despite the ‘max-results=1000’, blogger only returns the 100 most recent entries. (Try it on mine.) I also tried max-results=500, and the same thing happened.
I would be very interested in a mail backup utility, though.
When did you switch from perl to python?
What’s a factor of ten between astronomers?
From the looks of it, contrary to what I read online (I can’t believe the internet could lie to me), those urls can only get you 100 posts.
That a real shame since I liked how simple that script was. I guess I’ll play around and see if there are any other url-centric ways to get at the posts before going to the APIs.
I’ll try to clean up the mail thing and put it up in the next few days. It is a little braindead, but isn’t a bad starting point.
As for the python, I need something with better arrays then perl, so I decided to give it a shot a while ago. I still prefer the way perl does regular expressions, but I probably use python for about 60% of my scripting and analysis these days.
I don’t mean to complain about software that you are planning on giving away free of charge, but this little Gmail snafu today scared the daylights out of me. I’m looking forward to the Gmail backup sooner rather than later!
I do remember reading that you can set Gmail to allow POP3 on all emails (not just new ones), and then the first time you do a POP3 download, you can download them all. I’m way too dependent on Gmail as my centralized, universally accessible email archive.
Comments are closed.