1 | 23,042 | 49,224 | generalscraper | Scrapes Google |
2 | 24,917 | 32,129 | linkedindata | Scrapes all LinkedIn profiles including terms you specify. |
3 | 26,817 | 27,130 | jsontochart | Take JSON files and outputs html for various types of charts |
4 | 28,920 | 36,338 | entityextractor | Extracts entities and terms from any JSON. |
5 | 29,624 | 59,263 | linkedincrawler | Crawls public LinkedIn profiles via Google |
6 | 32,457 | 33,380 | dircrawl | Run block on all files in dir |
7 | 39,340 | 44,085 | wordcloud | Takes input and outputs the same text with word size changed based on frequency. |
8 | 40,293 | 29,792 | uploadconvert | Converts documents to the appropriate format for Transparency Toolkit. |
9 | 41,813 | 29,792 | linkedinparser | Parses public LinkedIn profiles |
10 | 42,837 | 39,837 | parsefile | OCR file and extract metadata using Apache Tika and Tesseract |
11 | 46,890 | 73,587 | urlarchiver | Saves html and pdfs of websites. |
12 | 48,945 | 28,864 | twittercrawler | Crawls Twitter |
13 | 51,059 | 46,526 | extractpatterns | Extracts entities and terms from any JSON. |
14 | 52,686 | 27,938 | timelinegen | TimelineGen generates JSON files for use as TimelineJS data. |
15 | 54,771 | 41,906 | sunlightcongress | Access to Sunlight Foundation's congress data. |
16 | 56,120 | 44,085 | indeedparser | Parses Indeed resumes |
17 | 59,559 | 52,232 | jsontonetworkgraph | Generates node and link data from any JSON. |
18 | 61,754 | 39,837 | piplrequest | Gets data from Pipl |
19 | 65,090 | 44,085 | tsjobcrawler | Crawls job listing websites for jobs requiring security clearance. |
20 | 68,898 | 52,232 | requestmanager | Manages proxies, wait intervals, etc |
21 | 73,337 | 52,232 | effscraper | Scrapes EFF court documents then extracts the plaintext and metadata. |
22 | 73,858 | 46,526 | termextractor | Extracts entities and terms from any JSON. |
23 | 73,903 | 49,224 | countryconvert | Converts 2-char ISO country codes to 3-char. |
24 | 78,454 | 59,263 | indeedcrawler | Crawls Indeed resumes |
25 | 78,511 | 63,493 | jsontomap | Converts a JSON into a GeoJSON. |
26 | 78,624 | 86,482 | sunlightpartytime | Access to Sunlight Foundation's Party Time data. |
27 | 85,850 | 46,526 | acluscraper | Scrapes ACLU court documents then extracts the plaintext and metadata. |
28 | 89,011 | 63,493 | jsoncrossreference | Crossreferences JSONs and returns the matches |
29 | 89,130 | 73,587 | piplcollector | Gets data from Pipl for dir of files |
30 | 91,698 | 161,562 | wlsearchscraper | Gets a list of documents from the WikiLeaks search that match certain terms. |
31 | 103,635 | 68,264 | datacalc | Some data calculation/manipulation for Transparency Toolkit. |
32 | 109,009 | 79,620 | jsoncombiner | Input multiple JSONs, get back one with all the data |
33 | 112,541 | 68,264 | doc_integrity_check | Encrypts, verifies, and checks hashes of files |
34 | 112,662 | 86,482 | jsontochoropleth | Converts as JSON to a world choropleth map. |
35 | 114,522 | 79,620 | ttcalc | Calculation functions for Transparency Toolkit. |
36 | 115,291 | 104,717 | sigadparse | Extracts SIGADs from documents |
37 | 130,201 | 117,132 | harvesterreporter | Incremental result reporting for Transparency Toolkit |
38 | 144,991 | 117,132 | guardianscraper | Scrapes Guardian articles. |
39 | 147,091 | 131,144 | indeedscraper | Get resumes and job listings from indeed based on search terms and locations. |
40 | 148,810 | 117,132 | nametoemail | Gets a list of possible email addresses. |
41 | 169,778 | 117,132 | docintegritycheck | Encrypts, verifies, and checks hashes of files |