Fax.com Ripping Script

Here’s a quick and dirty little script I wrote to rip fax.com. I had a bunch of faxes that I wanted to download so I put this together for my own use, and then added a little bit of extra stuff to make it potentially useful to someone else. My apologies in advance, as it’s pretty rough and will likely break at some point in the future. Fax.com will email you incoming faxes, which is great for archival going forward, unfortunately I hadn’t been saving my emails. Fax.com charges an overage fee of $0.05/page for faxes stored over 30 days. They have a nice feature to automatically purge faxes older than 30 days for you, but if you have already been paying an overage fee and would like to archive your faxes, you might find this handy to run before enabling the auto-purge feature. My settings are set to display 25 faxes per page (within the general settings for my account), you’ll need to tweak that (as well as the username and password fields) before running this script. I probably could have made this look up how many faxes per page you are set to, but I was lazy.

Download faxcomrip.zip

#!/bin/bash

USERNAME=""
PASSWORD=""
ARCHIVE_DIR="./archive"
FAXES_PER_PAGE="25"


curl_check="`whereis curl`"
if [ -z "${curl_check##'curl:'}" ]; then
        echo "Sorry, couldn't find curl"
        exit 1
fi

if ! [ -d "$ARCHIVE_DIR" ]; then
        echo "Making Archive Directory"
        mkdir "$ARCHIVE_DIR"
        exit_code="$?"
        if ! [ -d "$ARCHIVE_DIR" ]; then
                echo "Error: couldn't create $ARCHIVE_DIR" >&2
                exit "$exit_code"
        fi
fi

if [ -z "$USERNAME" ]; then
        echo -n "Fax.com username: "
        read USERNAME
fi
if [ -z "$PASSWORD" ]; then
        echo -n "Fax.com password: "
        stty -echo
        read PASSWORD
        stty echo
        echo ""
fi

cd "`dirname $0`"

if [ -f cookies.txt ]; then
        rm cookies.txt
fi

page_results="`curl -s -3 -L -c cookies.txt -b cookies.txt -d "username=$USERNAME&password=$PASSWORD&shadow=&login=Login" https://secure.fax.com/UnifiedLogin.serv`"

pages="${page_results##*Item(s) of}" ; pages=${pages%%<*} ; pages=$((pages / $FAXES_PER_PAGE)) ; ((++pages))
echo "Total pages: $pages"

fetch_pdf () {
        while read line
        do
                fax_name=${line#*name="} ; fax_name="${fax_name%%"*}"
                fax_name=${fax_name%.pdf}

                echo "$fax_name"

                fax_pages="${line#*$fax_name}" ; fax_pages="${fax_pages#*$fax_name}"
                fax_pages="${fax_pages#*sOut();">}" ; fax_pages=${fax_pages%%\<*}

                csid="${line#*CSID=}" ; csid="${csid%%\&*}"
                csid="${csid//-/}" ; csid="${csid//+/}"

                caller_id="${line#*ANI=}" ;  caller_id=${caller_id%%"*}
                caller_id="${caller_id//-/}" ; caller_id="${caller_id//+/}"

                fax_date=${line##*<td} ; fax_date="${fax_date#*>}"
                fax_date="${fax_date%%</td*}" ; fax_date="${fax_date//&nbsp;/}"
                year="${fax_date##*-}" ; year="${year%%\ *}"
                fax_date="$year-${fax_date//-$year/}"
                fax_date=${fax_date//\<font\ color="red"\>/} ; fax_date=${fax_date//\<\/font\>/}
                fax_date=`date -d "$fax_date" +%s`

                filename="$fax_date-$fax_pages-$caller_id-$csid-$fax_name"
                pdf_url=${line#*href="} ; pdf_url="${pdf_url%%"*}"
                if ! [ -f $ARCHIVE_DIR/$filename.pdf ]; then
                        echo "Saving $fax_name as $filename.pdf"
                curl -3 -L -c cookies.txt -b cookies.txt "https://secure.fax.com$pdf_url" -o $ARCHIVE_DIR/$filename.pdf
                        sleep 1 # Be nice[r] to the server
                #else
                #       echo "$pdf_url already downloaded"
                fi

        done
}

#grab all remaining pages from oldest to newest
while [ "$pages" -ge "0" ];
do
        page_results="`curl -s -3 -L -c cookies.txt -b cookies.txt "https://secure.fax.com/Inbox.serv?folder=Inbox&id=$((pages * $FAXES_PER_PAGE))"`"
        sleep 1
#       echo "New Page" >>broken.html
#       echo "$page_results" >>broken.html
        echo "$page_results" |grep 'EMViewer\.serv' | fetch_pdf
        echo "Processing page $page"
    ((--pages))
done

if [ -f cookies.txt ]; then
    rm cookies.txt
fi

echo "Done"
This entry was posted in Bash, Helpful Tools. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *