October 2008
5 posts
Data Scraping Wikipedia with Google Spreadsheets |... →
“The Google spreadsheet function =importHTM(””,”table”,N) will scrape a table from an HMTL web page into a Google spreadsheet. The URL of the target web page, and the target table element both need…
Election Protection - The Nation's Largest... →
“Throughout the election process, our volunteers - more than 10,000 strong - will be entering data and information into OurVote live (developed by the Electronic Frontier Foundation), an interactive…
Innovate this: Searching dirty →
“Information that is supposed to be private can sometimes inadvertently leak onto the web, through careless coding, or scanning, or editing, or incorrect placement on a server.”
Economic Data available for free online →
UK-focused but useful for U.S.-based reporting.
PHP: file_get_contents - Manual →
Add one insert query and you’re good to go.
September 2008
38 posts
Scraping Links With PHP | Makebeta →
Beautiful Soup documentation - Parse Tree →
Use the Parse Tree second of this Python-based parsing tool to scrape sites.
How to do a Conditional HTTP GET with Python... →
Another Python script that can be useful for site scraping. This one will first check to see if anything on a site has changed before initializing the scrape.
Searching a listserv's archives →
Need to search through thousands of listserv messages? Do it through email with this cheatsheet.
FedStats Data Access Tools →
This a fantastic index of science and environment-related datasets produced by the U.S. government. Checkout DataFerrett (linked on page), which is the data extraction tool. It’s not easy to use, but…
AFP reporters barred from using Wikipedia and... →
This is your warning not to use Wikipedia as a primary source. Too many reporters have been busted for using bad info from Wikipedia. I could show you examples, but I trust you only need to be warned…
Peter Shankman's "If I Can Help a Reporter Out" →
This is a great list developed by PR guy Peter Shankman. If you want to find an interviewee but don’t know where to turn, try posting a message to this email list. Be sure to fully vet the person who…
CataList, the official catalog of LISTSERV lists →
There are thousands of listservs (email discussion lists) that aren’t indexed by Google. This catalog is constantly updated with new public listservs. It’s a good place to dig for experts,…
WikiScanner: List anonymous wikipedia edits from... →
NIH Research Portfolio Online Reporting Tool... →
“Reports, data and analyses of NIH research and development activities.” The National Institutes of health runs many, many research programs on public health. This is a good place to go to start…
Data Access Tools from the Census Bureau →
Pure data, mapping information and other useful demographic info is collected by the U.S. Census Bureau. Great for understanding the big picture of your city, county, state and nation.
CQ.com - MoneyLine →
Discover the influence of money on the political process.
Members of Congress who Twitter - Congresspedia →
It’s true. Our congressmen and women are getting into the social media space. The list is pretty accurate, but as it’s a wiki, it carries the same caveats as all wiki sites: verify the info with…
District of Columbia Data Catalog and Data Feeds →
Lots of state and local governments are provide data not just in .csv formats but also in RSS, XML and KML (spatial data, otherwise known as geodata for mapping). This is one example.
BRB Publication's Public Record Resource Center →
This extremely useful index allows you to drill down to free data sources from your state, county and locality.
FactCheck.org →
Super Video Converter →
Converts almost any video format into QuickTime standard video. Very useful, per Emi Kolawole. PC only.
Campaign Ad Spotlight →
Real Player →
Emi Kolawole recommended Real Player. It has a Flash video download feature that will allow you to retain evidence of the existence of a video.
Budget FY 2009 - Appendix Table of Contents →
The appendix is the budget breakdown, line by line. Very useful for seeing how government plans to spend taxpayer money.
land records - Google Search →
Messy Google search that will bring you to land records links. Limit this to your area by including the name of your state, county or city in the search box.
Pipl - People Search →
This is another pretty good database for finding background information about people. Like ZabaSearch, you’ll need to do further verification to know that the information is sound, but it’s another…
FriendCSV | Facebook →
Facebook app allows you to pull down clean, organized data about your Facebook friends for import in-house. It’s useful, but be cautious in using it.
Electoral-vote.com: President, Senate, House... →
Free People Search by ZabaSearch →
This is one useful search engine for finding people, addresses and phone numbers when you don’t know anything about your subject. ZabaSearch claims to crawl public records, but they don’t disclose data sources. Use this to help you hone in on what you want, but cross-reference with a trusted source.
Google Guide Quick Reference: Google Advanced... →
Google Advanced Search →
FedStats: MapStats →
Cross federal statistics with the State of the Cities Data System and you get a great way to dig into local data collected by the government.
NYS Department of Correctional Services: Inmate... →
FedStats →
Index of government statistics collected from more than 100 agencies.
Advanced Twitter Search →
Metblogs →
Blogger network covering more than 50 cities. The best way to know what bloggers are covering online is to talk to people. If you’re shy or new to your community, start digging here and look through linkrolls/blogrolls. And get over being shy.
Public Access to Court Electronic Records (PACER) →
NYPL, Digital Collections →
New York Public Library - Databases and Indexes... →
“Library, library more than a book/ come find a new answer / come take a new look…”
Excluded Parties List System →
“The purpose of EPLS is to provide a single comprehensive list of individuals and firms excluded by Federal government agencies from receiving federal contracts or federally approved subcontracts and…
outside.in →
Look for news and blogs near you.
Philadelphia Police Reports →
Philly puts its police reports online. If you don’t know what you’re looking for, you can search for up to 6 hours of reports within a single day. As the records are available online, you know the data files are also available. Ask the department (or send a FOIA letter) to get large quantities of records.