Netzwelt titanic-magazin.de liferea/snownews scraping
A small Ruby snippet to have a full-content feed for titanic-magazin.de:
#!/usr/bin/ruby
require 'net/http'
require 'rexml/document'
require 'iconv'
def get_item url
item = Net::HTTP.get_response(URI.parse(url)).body
item.scan(/<td class="tt_news-bodytext">(.*?)<\/td>/m)[0][0]
end
feed = REXML::Document.new($stdin.readlines.join)
feed.elements.each('rss/channel/item') do |element|
element.elements['description'].text = Iconv.new('UTF-8', 'ISO-8859-1').iconv(get_item(element.elements['link'].text))
end
feed.write($stdout, 0)
A little bit slow as all items are fetched and there is no caching (yet).