1 | 22,332 | 63,432 | generalscraper | Scrapes Google |
2 | 24,156 | 63,432 | linkedindata | Scrapes all LinkedIn profiles including terms you specify. |
3 | 26,021 | 63,432 | jsontochart | Take JSON files and outputs html for various types of charts |
4 | 28,088 | 63,432 | entityextractor | Extracts entities and terms from any JSON. |
5 | 28,869 | 63,432 | linkedincrawler | Crawls public LinkedIn profiles via Google |
6 | 31,616 | 41,916 | dircrawl | Run block on all files in dir |
7 | 38,444 | 63,432 | wordcloud | Takes input and outputs the same text with word size changed based on frequency. |
8 | 39,421 | 63,432 | uploadconvert | Converts documents to the appropriate format for Transparency Toolkit. |
9 | 41,165 | 63,432 | linkedinparser | Parses public LinkedIn profiles |
10 | 41,869 | 63,432 | parsefile | OCR file and extract metadata using Apache Tika and Tesseract |
11 | 45,819 | 63,432 | urlarchiver | Saves html and pdfs of websites. |
12 | 48,441 | 63,432 | twittercrawler | Crawls Twitter |
13 | 50,228 | 63,432 | extractpatterns | Extracts entities and terms from any JSON. |
14 | 51,737 | 63,432 | timelinegen | TimelineGen generates JSON files for use as TimelineJS data. |
15 | 53,679 | 63,432 | sunlightcongress | Access to Sunlight Foundation's congress data. |
16 | 55,212 | 63,432 | indeedparser | Parses Indeed resumes |
17 | 58,554 | 63,432 | jsontonetworkgraph | Generates node and link data from any JSON. |
18 | 61,145 | 63,432 | piplrequest | Gets data from Pipl |
19 | 64,485 | 41,916 | tsjobcrawler | Crawls job listing websites for jobs requiring security clearance. |
20 | 67,738 | 18,158 | requestmanager | Manages proxies, wait intervals, etc |
21 | 72,120 | 30,305 | effscraper | Scrapes EFF court documents then extracts the plaintext and metadata. |
22 | 72,728 | 63,432 | countryconvert | Converts 2-char ISO country codes to 3-char. |
23 | 72,959 | 63,432 | termextractor | Extracts entities and terms from any JSON. |
24 | 77,170 | 63,432 | sunlightpartytime | Access to Sunlight Foundation's Party Time data. |
25 | 77,320 | 63,432 | jsontomap | Converts a JSON into a GeoJSON. |
26 | 77,512 | 63,432 | indeedcrawler | Crawls Indeed resumes |
27 | 84,640 | 63,432 | acluscraper | Scrapes ACLU court documents then extracts the plaintext and metadata. |
28 | 87,758 | 63,432 | jsoncrossreference | Crossreferences JSONs and returns the matches |
29 | 88,398 | 63,432 | piplcollector | Gets data from Pipl for dir of files |
30 | 90,305 | 63,432 | wlsearchscraper | Gets a list of documents from the WikiLeaks search that match certain terms. |
31 | 102,224 | 63,432 | datacalc | Some data calculation/manipulation for Transparency Toolkit. |
32 | 107,554 | 63,432 | jsoncombiner | Input multiple JSONs, get back one with all the data |
33 | 111,177 | 63,432 | jsontochoropleth | Converts as JSON to a world choropleth map. |
34 | 112,135 | 30,305 | doc_integrity_check | Encrypts, verifies, and checks hashes of files |
35 | 113,210 | 63,432 | ttcalc | Calculation functions for Transparency Toolkit. |
36 | 113,914 | 63,432 | sigadparse | Extracts SIGADs from documents |
37 | 129,071 | 63,432 | harvesterreporter | Incremental result reporting for Transparency Toolkit |
38 | 143,298 | 63,432 | guardianscraper | Scrapes Guardian articles. |
39 | 145,381 | 63,432 | indeedscraper | Get resumes and job listings from indeed based on search terms and locations. |
40 | 147,632 | 63,432 | nametoemail | Gets a list of possible email addresses. |
41 | 168,357 | 30,305 | docintegritycheck | Encrypts, verifies, and checks hashes of files |