Screenshots of the MSM's webpages. (Open Project)

Why not just use archive.org's crawler to do this, instead of rewriting the wheel.
http://crawler.archive.org/

Looks easy enough, but I don't have a hosting spot with a big chunk of storage space. As political sites may have video or other large files I'd imagine 50 sites, every 2-4 hours, would add up rather fast.

If anyone isn't up to taking this on, but has hosting space, give me a PM and I can do it.
 
2888c5k.png
 
I have to ask what information we really want to obtain, and whether or not this will advance the campaign. If so, how? I am not asking this to be difficult, but I want to make sure that we are doing something that will get us more votes in the primary, or more money.

I do not believe straight screen shots offer much value. In my opinion, we need to scrape the data, and then collect some information on how often different candidates (or non-candidates) appear within the article. Obtaining data isn't so hard, a simple wget script with cron can get that job done. However, this will leave us with quite a bit of data to analyze... do we want to show when articles have been changed? This can be done, but each site that we scrape will have to be configured for this purpose.

Do we want to put together histograms of how often each key term appears within an article? For instance, if we see an article where the words: GOP, nomination, Republican occur quite a bit, we would expect to see the names of other candidates too... if Ron Paul is missing from that, something fishy is going on and we can detect it. Obtaining information automatically from online polls can be a bit trickier, but can also be done. This would allow us to really quantify the coverage of each campaign, but to what end? What will happen when we have this information?
 
I may be able to throw something together, but I'm not sure how we'd go about automatically reporting the "offending" screencaps. Perhaps users of the site vote on the offending caps -- both standalone (e.g. the Politico shot above) and comparisons (first you see it, now you don't). What do you guys think?

Any ideas for a domain name? If someone can get that and maybe a VPS (like slicehost.com), I'll build it.

-Josh

Josh, what about integrating Google Alerts to pull any live news update with referenced keywords, we could then combine the RSS feeds using Yahoo Pipes and have a live stream of those updates posted to a blog (Wordpress). I have a few hundred sites running and don't mind popping up another one for this process if we need it, preferably WordPress. Ive got 5 different VPSs that i utilize for our SEO/SMO campaigns for clients so we could knock it out pretty quick.
 
I have to ask what information we really want to obtain, and whether or not this will advance the campaign. If so, how? I am not asking this to be difficult, but I want to make sure that we are doing something that will get us more votes in the primary, or more money.

This is a side project that is unrelated to getting votes in 2011. We might be able to use it in 2012.

It's a screenshot of home pages and politics sections to showcase what is promoted by the editors of websites.

I do not believe straight screen shots offer much value. In my opinion, we need to scrape the data, and then collect some information on how often different candidates (or non-candidates) appear within the article. Obtaining data isn't so hard, a simple wget script with cron can get that job done. However, this will leave us with quite a bit of data to analyze... do we want to show when articles have been changed? This can be done, but each site that we scrape will have to be configured for this purpose.
This would be good but I didn't want to burden anyone with a large project.


Here's what I made in 2007 from just a few screenshots. http://www.votemotion.com/ronpaul.php

I wish I could go back to 2007 and show what the webpages of cnn.com/politics looked like on a weekly basis and then compare that to Fox, CATO, Reason, RawStory, etc
 
Okay, so it sounds like all you want is a grab of specific articles... this is easy enough to do then if you are willing to sift through the data yourself. Do you want to be able to say which particular articles you are "watching" or just grab all articles from the page at once? I agree that the data can always be "mined" later, we just want to capture it first.

I would suggest grabbing all articles which are within a single link of the main page of any target... then just archive it by site, date, and time. This is very simple to do. No intelligence, nothing fancy... just an archive that can be revisited at a later date. Something like google cache then.
 
Not really specific articles, more so the landing pages for a home page and their politics home page.

What have the senior editors been promoting to their audience?

The NY Times can write an article on Paul, but if they just toss it straight into the archives...

I can't look back into the past as Google Cache only has a screenshot that is like a week old.

If you want to add intelligence or customize the project, by all means, have at it.
 
Not really specific articles, more so the landing pages for a home page and their politics home page.

What have the senior editors been promoting to their audience?

The NY Times can write an article on Paul, but if they just toss it straight into the archives...

I can't look back into the past as Google Cache only has a screenshot that is like a week old.

If you want to add intelligence or customize the project, by all means, have at it.


This could be integrated by either combining the feeds from the sites into one with Yahoo Pipes (if they have feeds) or we could integrate a wordpress scraper that could run on a cron and could scrape at specific intervals.

http://wordpress.org/extend/plugins/wp-web-scrapper/other_notes/

I could get something up and running by way of site/location but would need some help from a PHP master since my team is busy with business operations.

That being said, it would be best to rate the priority of this because my time is very valuable and it might be better served performing another function for the RP campaign.
 
Why not just use wget to grab a list of sites, and associated content every 1/2 hour with BASH ?
 
Here is Huffington Post article today. Lots of promoting of Perry and Bachmann. Notice the GOP candidates in photo for sidebar story link shows Santorum, Perry and Bachmann. To be fair, the article did mention Ron Paul in one sentence near the end:

"Now, five months before the Republican nomination contests are to begin, the field is largely set with Romney, Perry and Bachmann clustered near the top of many surveys, followed by Texas Rep. Ron Paul."

Showing Paul not a "top tier" candidate.
 
Last edited:
Sorry for the late response, I've been busy at work. I can write a script today (or tomorrow) that will do this, but I do not have an accessible server setup for this (and should not use resources at my work for this I think). If someone has a Linux system they can run the scripts on, with MySQL, I will upload the script for them.
 
Here is another daily disrespect of Ron Paul. The following screenshot is from Business Insider on 8-29-11. It is a Rick Perry promotional piece that looks like it could have easily been written by the Perry campaign. The article states only two obstacles stand in Perry's way: Bachmann and Romney. Paul's name is not mentioned in the article, even through he is polling higher than Bachmann.
 
Last edited:
Here is Politico from 8-30-11 rehashing yesterday's Business Insider story. Below is the number of time each candidate's name appears on this web page:

Perry - 14

Romney - 6

Santorum - 3

Bachmann - 1

Paul - 0
 
Last edited:
From our good friends at Bloomberg on 8-30-11 comes a "humorous" look at the 2012 GOP candidates and non-candidates. The article focuses on the MSM anointed top tier of Perry, Romney, and Bachmann. However, the article also mentions Tim Pawlenty, Donald Trump, and Sarah Palin, but gives zero mention of Ron Paul.
 
Last edited:
Sorry for the late response, I've been busy at work. I can write a script today (or tomorrow) that will do this, but I do not have an accessible server setup for this (and should not use resources at my work for this I think). If someone has a Linux system they can run the scripts on, with MySQL, I will upload the script for them.

I have a reseller account with a few unused domains residing on it if that works?
 
Back
Top