You are on page 1of 4

Extracting files from a Moodle 2 backup ...

Pre-requisties: MacOSX or Linux server with xmllint installed.


Don't know if you have xmllint? Open a terminal session and type:

whereis xmllint

If the response is something like this:

/usr/bin/xmllint

You have it!

This is a series of scripts ... bash and php that will 'extract' files from the
Moodle 2 'file system'
back into humanly re-cognizable file names - by the filename that was used when
uploading.

All the files linked in a Moodle 2 course backup are listed in a files.xml file and
the actual
'files' (contenthash named) reside in files folder of the backup.

Here's what one entry (there could be mulltiple) in files.xml looks like;

<file id="1730">
<contenthash>276f7718d18a9b776757c052f2dd85b9b55ccfbf</contenthash>
<contextid>1751</contextid>
<component>mod_resource</component>
<filearea>content</filearea>
<itemid>0</itemid>
<filepath>/</filepath>
<filename>Moodlemorphosis.ppt</filename>
<userid>3</userid>
<filesize>124416</filesize>
<mimetype>application/vnd.ms-powerpoint</mimetype>
<status>0</status>
<timecreated>1361734463</timecreated>
<timemodified>1361734463</timemodified>
<source>Moodlemorphosis.ppt</source>
<author>$@NULL@$</author>
<license>unknown</license>
<sortorder>1</sortorder>
<repositorytype>$@NULL@$</repositorytype>
<repositoryid>$@NULL@$</repositoryid>
<reference>$@NULL@$</reference>
</file>

The file actually resides in files/27/

Ken-Tasks-MacBook-Pro:ktask$ ls files/27/
276f7718d18a9b776757c052f2dd85b9b55ccfbf

The bash script called 'buildfiles' (called into action via: source buildfiles
[ENTER]) 'builds' using xmllint
two text files: contenthash.txt and filenames.txt.

xmllint --xpath //contenthash files.xml > contenthash.txt


xmllint --xpath //filename files.xml > filenames.txt
xmllint, however, still includes the XML tags around the elements we need.
<filename>Moodlemorphosis.ppt</filename>
<contenthash>276f7718d18a9b776757c052f2dd85b9b55ccfbf</contenthash>

This is where the two php scripts come into play: stripcontenthash.php and
stripfilenames.php
They remove the remaining XML tags leaving list a listing of contenthashes and
filenames.

contenthash.txt now looks like:

774e518498f923865adce2442adc1ffac1adf38a
da39a3ee5e6b4b0d3255bfef95601890afd80709
0ed1e7611f937fa16bc7ba239a07b88f7a1a7a12
da39a3ee5e6b4b0d3255bfef95601890afd80709
013ee7039a7ec06834ecd2798d99cb210aeaa5ab
0ef65e0cf83d71c70e96126c55a91bcf0441b356
276f7718d18a9b776757c052f2dd85b9b55ccfbf
da39a3ee5e6b4b0d3255bfef95601890afd80709
44155b4549bba8f706f576300ae944bec872944a
da39a3ee5e6b4b0d3255bfef95601890afd80709
276f7718d18a9b776757c052f2dd85b9b55ccfbf
da39a3ee5e6b4b0d3255bfef95601890afd80709

filenames.txt now looks like this:

01-_Getting_your_Moodle_Class_Made.ppt
f1.png
f2.png
f3.png
Moodlemorphosis.ppt
Moodlemorphosis.swf
Moodlemorphosis.ppt

Next step: get file paths from the contenthash.txt file. Bash script: getfilepaths
uses the contenthash.txt file and operating system's find to create a filepaths.txt
file.

./files/77/774e518498f923865adce2442adc1ffac1adf38a
./files/0e/0ed1e7611f937fa16bc7ba239a07b88f7a1a7a12
./files/01/013ee7039a7ec06834ecd2798d99cb210aeaa5ab
./files/0e/0ef65e0cf83d71c70e96126c55a91bcf0441b356
./files/27/276f7718d18a9b776757c052f2dd85b9b55ccfbf
./files/44/44155b4549bba8f706f576300ae944bec872944a
./files/27/276f7718d18a9b776757c052f2dd85b9b55ccfbf

source getfilepaths
File paths ...
--------------
./files/77/774e518498f923865adce2442adc1ffac1adf38a
./files/0e/0ed1e7611f937fa16bc7ba239a07b88f7a1a7a12
./files/01/013ee7039a7ec06834ecd2798d99cb210aeaa5ab
./files/0e/0ef65e0cf83d71c70e96126c55a91bcf0441b356
./files/27/276f7718d18a9b776757c052f2dd85b9b55ccfbf
./files/44/44155b4549bba8f706f576300ae944bec872944a
./files/27/276f7718d18a9b776757c052f2dd85b9b55ccfbf
--------------
--------------
File Names ...
--------------
01-_Getting_your_Moodle_Class_Made.ppt
f1.png
f2.png
f3.png
Moodlemorphosis.ppt
Moodlemorphosis.swf
Moodlemorphosis.ppt
--------------

Last script: php getcommands.php builds a listing of commands one may use:

php getcommands.php
cp -p ./files/77/774e518498f923865adce2442adc1ffac1adf38a ./extracted/01-
_Getting_your_Moodle_Class_Made.ppt
cp -p ./files/0e/0ed1e7611f937fa16bc7ba239a07b88f7a1a7a12 ./extracted/f1.png
cp -p ./files/01/013ee7039a7ec06834ecd2798d99cb210aeaa5ab ./extracted/f2.png
cp -p ./files/0e/0ef65e0cf83d71c70e96126c55a91bcf0441b356 ./extracted/f3.png
cp -p ./files/27/276f7718d18a9b776757c052f2dd85b9b55ccfbf
./extracted/Moodlemorphosis.ppt
cp -p ./files/44/44155b4549bba8f706f576300ae944bec872944a
./extracted/Moodlemorphosis.swf
cp -p ./files/27/276f7718d18a9b776757c052f2dd85b9b55ccfbf
./extracted/Moodlemorphosis.ppt

If one copies one of those lines to the clipboard, then paste it back into the
terminal screen,
the contenthashed file is copied (preserving ownership) to an ./extracted/ folder
with the humanly
recognizable name as the file was uploaded.

OR one can issue: source commands.txt [ENTER]


and all of the files will be copied to the extracted directory.

Another script (getinfo) shows info on the contenthash file names:

./files/27/276f7718d18a9b776757c052f2dd85b9b55ccfbf
CDF V2 Document, Little Endian, Os: Windows, Version 1.0, Code
page: -535, Title: Moodlemorphosis: The Transformation an ELAR Classroom via Online
Learning, Last Saved By: MAVALDEZ,
Revision Number: 50, Total Editing Time: 15:11:47, Last Printed: Fri Dec 3 13:10:00
2010, Create Time/Date: Fri Dec 3
12:17:23 2010, Last Saved Time/Date: Tue Aug 21 04:09:39 2012

Tells us the 276f7718d18a9b776757c052f2dd85b9b55ccfbf is Moodlemorphosis.ppt.

So the quickie (for busy folks);

1. create a work directory for extracting: mkdir work


2. copy your .mbz file into the work directory
3. extract from the .mbz the files directory and the files.xml file:
tar zxvf [nameofbackup].mbz files files.xml
3. copy the bash and php scripts into that directory:
buildfiles, both strip....php files, getfilepaths, getcommands.php, doit

wget http://sos.tcea.org/extractscripts.zip
and unzip extractscripts.zip
5. execute in this order:

source buildfiles
php stripcontenthash.php
php stripfilenames.php
source getfilepaths
php getcommands.php

REALLY in a hurry ... copy all the bash and php scripts into directory.
Issue: source doit [ENTER]
(doit script executes the above in their proper order)

You might also like