[SATLUG] wget frustration

Jon Mark Allen jm at allensonthe.net
Fri May 23 09:28:19 CDT 2008


Daniel Givens wrote:
[trim]
> 
> I would suggest you look into getting hooked up with the actual RSS
> feed and use a podcasting application. You can only hope they include
> a nice long history of previous shows in the feed. Of course, you
> could spend your Memorial Day weekend Googling for archives.
> 

First, let me say that I agree with Daniel.  Subscribing to a podcast
with RSS is the best way to go.  However, if you still want to pursue
using wget for this .....

Since the site doesn't have directory listings you have to compile a
list of files you want and you can feed that to wget using the -i option.

The file at http://www.the950.com/podcast/mornings.xml has a list of
episodes (the same ones you'd get in the podcast, of course), but it's
in XML and wget doesn't care for that so much.  The trick, then, is to
extract only the urls you want to fetch.

So first, I used wget to retrieve the mornings.xml file.  Then:

The mornings.xml file looks like this:

<item>
<title>Best of Walton and Johnson</title>
<comments></comments>
<pubDate>Thu, 22 May 2008 10:36:12 -14400</pubDate>
<description>Gas from Water/Tax breaks from Congress</description>
<imageurl></imageurl>
<enclosure
url="http://a1135.g.akamai.net/f/1135/18227/1h/cchannel.download.akamai.com/18227/podcast/HOUSTON-TX/KPRC-AM/5.22.08-8A_4.mp3?CPROG=PCAST&amp;MARKET=HOUSTON-TX&amp;NG_FORMAT=talk&amp;SITE_ID=676&amp;STATION_ID=KPRC-AM&amp;PCAST_AUTHOR=Walton_%26_Johnson&amp;PCAST_CAT=Walton_%26_Johnson&amp;PCAST_TITLE=Walton_and_Johnson"
length="" type="audio/mpeg" />
<itunes:subtitle></itunes:subtitle>
<itunes:summary>Best of</itunes:summary>
<guid>http://a1135.g.akamai.net/f/1135/18227/1h/cchannel.download.akamai.com/18227/podcast/HOUSTON-TX/KPRC-AM/5.22.08-8A_4.mp3</guid>
<category>Talk</category>
<itunes:category>Talk</itunes:category>
<itunes:explicit>No</itunes:explicit>
<itunes:duration>9</itunes:duration>
<itunes:keywords></itunes:keywords>
</item>

The url we want can be found in the <guid> element, but we still need to
remove the tags around it.

sed -n -e 's/^.*\(http.*mp3\).*/\1/p' will take everything in between
(and including) the 'http' and the 'mp3'.  Put it all together and
redirect to a file for wget:

grep guid mornings.xml | sed -n -e 's/^.*\(http.*mp3\).*/\1/p' > urls.txt

(all on one line of course.)

Next, tell wget to read the urls.txt file for the list of files to retrieve:

wget -i urls.txt

OR, you could subscribe with an RSS reader to the podcast..... :-D

JM



More information about the SATLUG mailing list