How does planet minecraft host downloads - excited
News You Can Bruise for February 21 (entry 0)
Top-line numbers: I've archived another gigabytes of good stuff, including 18k maps and schematics, 1k mods, 11k skins, 7k texture packs (resource packs now, I guess), and k screenshots. I was able to archive about 73% of the maps. Four percent of them maps were just gone, and 23% I didn't know how to download.
The Minecraft Archive Project capture contains data from four sites. The new capture is limited to two sites: the official Minecraft forum and the huge Planet Minecraft site. I started archiving maps, mods, and textures for Minecraft Pocket Edition, and was able to pick up about MCPE maps.
Now that I've done this twice without getting into trouble, I'll give a little more detail about the process. I've got scripts that download the archives of the Minecraft forum and Planet Minecraft. I find all the threads/projects modified since the last capture, download the corresponding detail pages (e.g. the first page of a forum thread--I'm only after the original post), and extract all the links.
Then it's a matter of archiving as many of those links as possible. I've written recipes for archiving images and downloads. These six recipes take care of the vast majority of items:
- Two file hosts: Mediafire and Dropbox
- Four image hosts: imgur, Photobucket, TinyPic, and gwd.es
There's also a general catch-all for people who host things on normal home pages, as Tim Berners-Lee intended. If your URL looks like the URL to an image or a binary archive, I will ask for that URL. If you serve me the image or the binary instead of an HTML file telling me to click on something, then I'll archive the file.
I decode most link shorteners except for the ones that make you click through ads, mainly gwd.es and gwd.es The archive had about 18, maps behind gwd.es links, and I spent a lot of time running Selenium clients clicking through the ads to discover the Mediafire links. I think that took a month. This time there were about new maps behind gwd.es links and I just didn't bother.
There are two big blind spots in my dataset, and they're the same as last time. One is mods. A lot of mods are hosted on Github and CurseForge, two big sites I didn't write recipes for. There's also the issue of mod packs, which have been steadily growing in popularity and complexity as development on core Minecraft winds down. Thanks to things like the Hardcore Questing Mod, modpacks are entering the "custom challenge" territory previously occupied solely by world archives.
There are sites that list mod packs (12) but I don't want to spend the time figuring out how to archive all the mod packs. There's also the problem that mod packs are huge.
The second blind spot is servers. It's theoretically possible to join a public Minecraft server with a modded client and automatically archive the map, but realistically it ain't gonna happen. I complained about this last time, but now I've done an assessment of what's being lost.
Planet Minecraft has a big server list that mentions the last time it was able to ping any particular server. There doesn't seem to be any purging of dead servers, so I'm able to get good measurements of the typical lifecycle.
Of the k servers in the list, 12k are "online" (The most recent Planet Minecraft ping was successful). 51k are "offline" (Most recent Planet Minecraft ping failed, but there was a successful ping less than two weeks ago) and 73k I declare "dead" (last successful ping was more than two weeks ago). It seems really weird that of the nearly half of the 'offline' servers went offline in the past two weeks, so something's going on there; maybe Planet Minecraft's ping process is unreliable, or it just takes a long time to check every server, or servers go up and down all the time.
Anyway, the median lifetime for a public Minecraft server is days, a little over a year. These things go online, people do a bunch of work on them, and then they disappear. I've kind of gotten to 'acceptance' on this, but it's still obnoxious.
One final thing: I thought I'd check if I could see the result of Mojang's June announcement of rules for how you can make money by hosting servers (and, more importantly, how you can't). I wanted to see if these rules had a chilling effect on the formation of new servers or caused a lot of old servers to shut down.
And no, not really. Here's a chart showing two sixty-day periods around June 12, the date of the Mojang blog post. For each day I show 'births' (the number of servers first seen on that day) and 'deaths' (the number of servers last seen on that day). There's a drop-off in new servers around the end of July, but then it picks up again stronger than before. I don't have an explanation for it but I don't think there's anything in here you can pin on a blog post. The Mojang rules were probably intended to go after a small number of large obnoxious servers, and everyone else either doesn't care or flies under the radar.
(Screenshot is from World #57 by Art_Fox. I didn't archive the map because it's behind an gwd.es link, but I got the screenshot.)
PS: Congratulations to Anticraft, the oldest public Minecraft server I could find that's still online, added to Planet Minecraft on February 28,
Update: I fixed up the gwd.es code and let it run for another two weeks (!), saving another Minecraft maps and MCPE maps. I probably won't do this again because it's a huge pain, but I said that this time and ended up doing it out of some sense of obligation to the future, so maybe obligation will strike again, who knows.
-
-