| 1 | 23,302 | 143,427 | generalscraper | Scrapes Google |
| 2 | 25,164 | 77,741 | linkedindata | Scrapes all LinkedIn profiles including terms you specify. |
| 3 | 27,204 | 143,427 | jsontochart | Take JSON files and outputs html for various types of charts |
| 4 | 29,166 | 92,912 | entityextractor | Extracts entities and terms from any JSON. |
| 5 | 29,908 | 143,427 | linkedincrawler | Crawls public LinkedIn profiles via Google |
| 6 | 32,590 | 111,035 | dircrawl | Run block on all files in dir |
| 7 | 39,403 | 30,171 | wordcloud | Takes input and outputs the same text with word size changed based on frequency. |
| 8 | 40,329 | 27,645 | uploadconvert | Converts documents to the appropriate format for Transparency Toolkit. |
| 9 | 41,715 | 77,741 | linkedinparser | Parses public LinkedIn profiles |
| 10 | 42,974 | 111,035 | parsefile | OCR file and extract metadata using Apache Tika and Tesseract |
| 11 | 47,117 | 35,503 | urlarchiver | Saves html and pdfs of websites. |
| 12 | 48,701 | 33,465 | twittercrawler | Crawls Twitter |
| 13 | 51,032 | 77,741 | extractpatterns | Extracts entities and terms from any JSON. |
| 14 | 52,855 | 92,912 | timelinegen | TimelineGen generates JSON files for use as TimelineJS data. |
| 15 | 55,087 | 67,069 | sunlightcongress | Access to Sunlight Foundation's congress data. |
| 16 | 56,112 | 111,035 | indeedparser | Parses Indeed resumes |
| 17 | 59,735 | 111,035 | jsontonetworkgraph | Generates node and link data from any JSON. |
| 18 | 61,533 | 92,912 | piplrequest | Gets data from Pipl |
| 19 | 64,510 | 40,670 | tsjobcrawler | Crawls job listing websites for jobs requiring security clearance. |
| 20 | 69,733 | 143,427 | requestmanager | Manages proxies, wait intervals, etc |
| 21 | 73,734 | 111,035 | effscraper | Scrapes EFF court documents then extracts the plaintext and metadata. |
| 22 | 74,158 | 143,427 | countryconvert | Converts 2-char ISO country codes to 3-char. |
| 23 | 74,313 | 77,741 | termextractor | Extracts entities and terms from any JSON. |
| 24 | 78,551 | 111,035 | indeedcrawler | Crawls Indeed resumes |
| 25 | 78,764 | 143,427 | jsontomap | Converts a JSON into a GeoJSON. |
| 26 | 79,169 | 92,912 | sunlightpartytime | Access to Sunlight Foundation's Party Time data. |
| 27 | 86,426 | 67,069 | acluscraper | Scrapes ACLU court documents then extracts the plaintext and metadata. |
| 28 | 89,195 | 143,427 | piplcollector | Gets data from Pipl for dir of files |
| 29 | 89,998 | 143,427 | jsoncrossreference | Crossreferences JSONs and returns the matches |
| 30 | 93,393 | 59,042 | wlsearchscraper | Gets a list of documents from the WikiLeaks search that match certain terms. |
| 31 | 104,670 | 143,427 | datacalc | Some data calculation/manipulation for Transparency Toolkit. |
| 32 | 110,211 | 143,427 | jsoncombiner | Input multiple JSONs, get back one with all the data |
| 33 | 111,882 | 111,035 | doc_integrity_check | Encrypts, verifies, and checks hashes of files |
| 34 | 113,120 | 111,035 | jsontochoropleth | Converts as JSON to a world choropleth map. |
| 35 | 114,812 | 92,912 | ttcalc | Calculation functions for Transparency Toolkit. |
| 36 | 115,462 | 111,035 | sigadparse | Extracts SIGADs from documents |
| 37 | 130,148 | 143,427 | harvesterreporter | Incremental result reporting for Transparency Toolkit |
| 38 | 146,191 | 143,427 | guardianscraper | Scrapes Guardian articles. |
| 39 | 147,727 | 111,035 | indeedscraper | Get resumes and job listings from indeed based on search terms and locations. |
| 40 | 148,691 | 143,427 | nametoemail | Gets a list of possible email addresses. |
| 41 | 170,436 | 92,912 | docintegritycheck | Encrypts, verifies, and checks hashes of files |