1 | 21,961 | 64,836 | generalscraper | Scrapes Google |
2 | 23,735 | 143,384 | linkedindata | Scrapes all LinkedIn profiles including terms you specify. |
3 | 25,514 | 49,088 | jsontochart | Take JSON files and outputs html for various types of charts |
4 | 27,581 | 64,836 | entityextractor | Extracts entities and terms from any JSON. |
5 | 28,501 | 143,384 | linkedincrawler | Crawls public LinkedIn profiles via Google |
6 | 31,165 | 64,836 | dircrawl | Run block on all files in dir |
7 | 37,910 | 143,384 | wordcloud | Takes input and outputs the same text with word size changed based on frequency. |
8 | 38,650 | 143,384 | uploadconvert | Converts documents to the appropriate format for Transparency Toolkit. |
9 | 40,639 | 143,384 | linkedinparser | Parses public LinkedIn profiles |
10 | 41,404 | 20,865 | parsefile | OCR file and extract metadata using Apache Tika and Tesseract |
11 | 45,102 | 64,836 | urlarchiver | Saves html and pdfs of websites. |
12 | 48,216 | 143,384 | twittercrawler | Crawls Twitter |
13 | 49,858 | 64,836 | extractpatterns | Extracts entities and terms from any JSON. |
14 | 50,850 | 143,384 | timelinegen | TimelineGen generates JSON files for use as TimelineJS data. |
15 | 52,890 | 27,459 | sunlightcongress | Access to Sunlight Foundation's congress data. |
16 | 54,944 | 64,836 | indeedparser | Parses Indeed resumes |
17 | 57,833 | 49,088 | jsontonetworkgraph | Generates node and link data from any JSON. |
18 | 60,594 | 27,459 | piplrequest | Gets data from Pipl |
19 | 64,243 | 143,384 | tsjobcrawler | Crawls job listing websites for jobs requiring security clearance. |
20 | 67,189 | 29,669 | requestmanager | Manages proxies, wait intervals, etc |
21 | 71,170 | 64,836 | effscraper | Scrapes EFF court documents then extracts the plaintext and metadata. |
22 | 71,861 | 64,836 | countryconvert | Converts 2-char ISO country codes to 3-char. |
23 | 72,176 | 143,384 | termextractor | Extracts entities and terms from any JSON. |
24 | 76,127 | 36,098 | sunlightpartytime | Access to Sunlight Foundation's Party Time data. |
25 | 76,545 | 49,088 | jsontomap | Converts a JSON into a GeoJSON. |
26 | 77,317 | 64,836 | indeedcrawler | Crawls Indeed resumes |
27 | 83,902 | 64,836 | acluscraper | Scrapes ACLU court documents then extracts the plaintext and metadata. |
28 | 86,736 | 64,836 | jsoncrossreference | Crossreferences JSONs and returns the matches |
29 | 87,800 | 36,098 | piplcollector | Gets data from Pipl for dir of files |
30 | 89,199 | 143,384 | wlsearchscraper | Gets a list of documents from the WikiLeaks search that match certain terms. |
31 | 101,086 | 64,836 | datacalc | Some data calculation/manipulation for Transparency Toolkit. |
32 | 106,396 | 49,088 | jsoncombiner | Input multiple JSONs, get back one with all the data |
33 | 110,044 | 49,088 | jsontochoropleth | Converts as JSON to a world choropleth map. |
34 | 111,840 | 143,384 | ttcalc | Calculation functions for Transparency Toolkit. |
35 | 112,113 | 49,088 | doc_integrity_check | Encrypts, verifies, and checks hashes of files |
36 | 112,569 | 49,088 | sigadparse | Extracts SIGADs from documents |
37 | 128,980 | 64,836 | harvesterreporter | Incremental result reporting for Transparency Toolkit |
38 | 141,707 | 64,836 | guardianscraper | Scrapes Guardian articles. |
39 | 143,940 | 64,836 | indeedscraper | Get resumes and job listings from indeed based on search terms and locations. |
40 | 146,147 | 143,384 | nametoemail | Gets a list of possible email addresses. |
41 | 167,042 | 64,836 | docintegritycheck | Encrypts, verifies, and checks hashes of files |