1 | 21,973 | 9,761 | generalscraper | Scrapes Google |
2 | 23,704 | 20,872 | linkedindata | Scrapes all LinkedIn profiles including terms you specify. |
3 | 25,525 | 23,834 | jsontochart | Take JSON files and outputs html for various types of charts |
4 | 27,618 | 25,591 | entityextractor | Extracts entities and terms from any JSON. |
5 | 28,522 | 17,287 | linkedincrawler | Crawls public LinkedIn profiles via Google |
6 | 31,176 | 11,656 | dircrawl | Run block on all files in dir |
7 | 37,772 | 94,831 | wordcloud | Takes input and outputs the same text with word size changed based on frequency. |
8 | 38,699 | 94,831 | uploadconvert | Converts documents to the appropriate format for Transparency Toolkit. |
9 | 40,683 | 29,054 | linkedinparser | Parses public LinkedIn profiles |
10 | 41,428 | 16,842 | parsefile | OCR file and extract metadata using Apache Tika and Tesseract |
11 | 45,164 | 135,887 | urlarchiver | Saves html and pdfs of websites. |
12 | 48,164 | 17,685 | twittercrawler | Crawls Twitter |
13 | 49,916 | 35,772 | extractpatterns | Extracts entities and terms from any JSON. |
14 | 50,953 | 135,887 | timelinegen | TimelineGen generates JSON files for use as TimelineJS data. |
15 | 52,936 | 94,831 | sunlightcongress | Access to Sunlight Foundation's congress data. |
16 | 55,005 | 38,118 | indeedparser | Parses Indeed resumes |
17 | 57,917 | 44,572 | jsontonetworkgraph | Generates node and link data from any JSON. |
18 | 60,613 | 21,572 | piplrequest | Gets data from Pipl |
19 | 64,246 | 29,054 | tsjobcrawler | Crawls job listing websites for jobs requiring security clearance. |
20 | 67,261 | 135,887 | requestmanager | Manages proxies, wait intervals, etc |
21 | 71,253 | 54,969 | effscraper | Scrapes EFF court documents then extracts the plaintext and metadata. |
22 | 71,915 | 54,969 | countryconvert | Converts 2-char ISO country codes to 3-char. |
23 | 72,196 | 135,887 | termextractor | Extracts entities and terms from any JSON. |
24 | 76,238 | 135,887 | sunlightpartytime | Access to Sunlight Foundation's Party Time data. |
25 | 76,622 | 54,969 | jsontomap | Converts a JSON into a GeoJSON. |
26 | 77,371 | 49,264 | indeedcrawler | Crawls Indeed resumes |
27 | 83,365 | 75,441 | acluscraper | Scrapes ACLU court documents then extracts the plaintext and metadata. |
28 | 86,833 | 63,366 | jsoncrossreference | Crossreferences JSONs and returns the matches |
29 | 87,847 | 54,969 | piplcollector | Gets data from Pipl for dir of files |
30 | 89,299 | 94,831 | wlsearchscraper | Gets a list of documents from the WikiLeaks search that match certain terms. |
31 | 101,009 | 75,441 | datacalc | Some data calculation/manipulation for Transparency Toolkit. |
32 | 106,538 | 75,441 | jsoncombiner | Input multiple JSONs, get back one with all the data |
33 | 110,164 | 75,441 | jsontochoropleth | Converts as JSON to a world choropleth map. |
34 | 112,000 | 135,887 | ttcalc | Calculation functions for Transparency Toolkit. |
35 | 112,140 | 63,366 | doc_integrity_check | Encrypts, verifies, and checks hashes of files |
36 | 112,709 | 135,887 | sigadparse | Extracts SIGADs from documents |
37 | 128,896 | 38,118 | harvesterreporter | Incremental result reporting for Transparency Toolkit |
38 | 141,868 | 94,831 | guardianscraper | Scrapes Guardian articles. |
39 | 144,076 | 94,831 | indeedscraper | Get resumes and job listings from indeed based on search terms and locations. |
40 | 146,219 | 94,831 | nametoemail | Gets a list of possible email addresses. |
41 | 167,171 | 94,831 | docintegritycheck | Encrypts, verifies, and checks hashes of files |