October 2008
5 posts
Data Scraping Wikipedia with Google Spreadsheets |... →
“The Google spreadsheet function =importHTM(””,”table”,N) will scrape a table from an HMTL web page into a Google spreadsheet. The URL of the target web page, and the target table element both need…
Oct 29th
Election Protection - The Nation's Largest... →
“Throughout the election process, our volunteers - more than 10,000 strong - will be entering data and information into OurVote live (developed by the Electronic Frontier Foundation), an interactive…
Oct 28th
Innovate this: Searching dirty →
“Information that is supposed to be private can sometimes inadvertently leak onto the web, through careless coding, or scanning, or editing, or incorrect placement on a server.”
Oct 25th
Economic Data available for free online →
UK-focused but useful for U.S.-based reporting.
Oct 25th
PHP: file_get_contents - Manual →
Add one insert query and you’re good to go.
Oct 11th
September 2008
38 posts
Scraping Links With PHP | Makebeta →
Sep 17th
Beautiful Soup documentation - Parse Tree →
Use the Parse Tree second of this Python-based parsing tool to scrape sites.
Sep 16th
How to do a Conditional HTTP GET with Python... →
Another Python script that can be useful for site scraping. This one will first check to see if anything on a site has changed before initializing the scrape.
Sep 16th
Searching a listserv's archives →
Need to search through thousands of listserv messages? Do it through email with this cheatsheet.
Sep 13th
FedStats Data Access Tools →
This a fantastic index of science and environment-related datasets produced by the U.S. government. Checkout DataFerrett (linked on page), which is the data extraction tool. It’s not easy to use, but…
Sep 13th
AFP reporters barred from using Wikipedia and... →
This is your warning not to use Wikipedia as a primary source. Too many reporters have been busted for using bad info from Wikipedia. I could show you examples, but I trust you only need to be warned…
Sep 13th
Peter Shankman's "If I Can Help a Reporter Out" →
This is a great list developed by PR guy Peter Shankman. If you want to find an interviewee but don’t know where to turn, try posting a message to this email list. Be sure to fully vet the person who…
Sep 13th
CataList, the official catalog of LISTSERV lists →
There are thousands of listservs (email discussion lists) that aren’t indexed by Google. This catalog is constantly updated with new public listservs. It’s a good place to dig for experts,…
Sep 13th
WikiScanner: List anonymous wikipedia edits from... →
Sep 13th
NIH Research Portfolio Online Reporting Tool... →
“Reports, data and analyses of NIH research and development activities.” The National Institutes of health runs many, many research programs on public health. This is a good place to go to start…
Sep 13th
Data Access Tools from the Census Bureau →
Pure data, mapping information and other useful demographic info is collected by the U.S. Census Bureau. Great for understanding the big picture of your city, county, state and nation.
Sep 13th
CQ.com - MoneyLine →
Discover the influence of money on the political process.
Sep 13th
Members of Congress who Twitter - Congresspedia →
It’s true. Our congressmen and women are getting into the social media space. The list is pretty accurate, but as it’s a wiki, it carries the same caveats as all wiki sites: verify the info with…
Sep 13th
District of Columbia Data Catalog and Data Feeds →
Lots of state and local governments are provide data not just in .csv formats but also in RSS, XML and KML (spatial data, otherwise known as geodata for mapping). This is one example.
Sep 13th
BRB Publication's Public Record Resource Center →
This extremely useful index allows you to drill down to free data sources from your state, county and locality.
Sep 13th
FactCheck.org →
Sep 13th
Super Video Converter →
Converts almost any video format into QuickTime standard video. Very useful, per Emi Kolawole. PC only.
Sep 13th
Campaign Ad Spotlight →
Sep 13th
Real Player →
Emi Kolawole recommended Real Player. It has a Flash video download feature that will allow you to retain evidence of the existence of a video.
Sep 13th
Budget FY 2009 - Appendix Table of Contents →
The appendix is the budget breakdown, line by line. Very useful for seeing how government plans to spend taxpayer money.
Sep 13th
land records - Google Search →
Messy Google search that will bring you to land records links. Limit this to your area by including the name of your state, county or city in the search box.
Sep 13th
Pipl - People Search →
This is another pretty good database for finding background information about people. Like ZabaSearch, you’ll need to do further verification to know that the information is sound, but it’s another…
Sep 13th
FriendCSV | Facebook →
Facebook app allows you to pull down clean, organized data about your Facebook friends for import in-house. It’s useful, but be cautious in using it.
Sep 13th
Electoral-vote.com: President, Senate, House... →
Sep 13th
Free People Search by ZabaSearch →
This is one useful search engine for finding people, addresses and phone numbers when you don’t know anything about your subject. ZabaSearch claims to crawl public records, but they don’t disclose data sources. Use this to help you hone in on what you want, but cross-reference with a trusted source.
Sep 13th
Google Guide Quick Reference: Google Advanced... →
Sep 12th
Google Advanced Search →
Sep 12th
FedStats: MapStats →
Cross federal statistics with the State of the Cities Data System and you get a great way to dig into local data collected by the government.
Sep 12th
NYS Department of Correctional Services: Inmate... →
Sep 12th
FedStats →
Index of government statistics collected from more than 100 agencies.
Sep 12th
Advanced Twitter Search →
Sep 12th
Metblogs →
Blogger network covering more than 50 cities. The best way to know what bloggers are covering online is to talk to people. If you’re shy or new to your community, start digging here and look through linkrolls/blogrolls. And get over being shy.
Sep 12th
Public Access to Court Electronic Records (PACER) →
Sep 12th
NYPL, Digital Collections →
Sep 12th
New York Public Library - Databases and Indexes... →
“Library, library more than a book/ come find a new answer / come take a new look…”
Sep 12th
Excluded Parties List System →
“The purpose of EPLS is to provide a single comprehensive list of individuals and firms excluded by Federal government agencies from receiving federal contracts or federally approved subcontracts and…
Sep 12th
outside.in →
Look for news and blogs near you.
Sep 12th
Philadelphia Police Reports →
Philly puts its police reports online. If you don’t know what you’re looking for, you can search for up to 6 hours of reports within a single day. As the records are available online, you know the data files are also available. Ask the department (or send a FOIA letter) to get large quantities of records.
Sep 12th