環境資訊公開與監督專案

by 綠色公民行動聯盟

資料欄位

https://docs.google.com/spreadsheets/d/1XaU4R6EtzCMZmKN6CdHurvMINdOYQkkbgbLJT5SQgQ8/edit#gid=0

高雄市政府煙道連續監測opendata欄位：

https://docs.google.com/spreadsheets/d/1My4nux9wZ6_v7I_lL8NZrSwp4ECWXa6UPl-kRGGuPDQ/edit?usp=sharing

11月opendata工作坊記錄：

http://beta.hackfoldr.org/1VxkIpLVa4ZK-MIFcTy7DH-Oqson14N9j2QqjlsnGYao

馬軍NGO分享

https://hackpad.com/2015.09.10--lGAFJzYZmrU

參考資料

中國環境團體的資料：

公司基本資料：

其他關於企業網絡的資料：

卞中佩企業污染的資料整理：

公開資訊觀測站：

http://mops.twse.com.tw/mops/web/index

經濟部工廠資料：

http://gcis.nat.gov.tw/Fidbweb/index.jsp

Finjon Kiang: https://wdc.gov.tw/syi.idb/

開罰紀錄：

http://prtr.epa.gov.tw/FacilityInfo/Data?search=False
- 拉了一份下來 https://github.com/kiang/prtr.epa.gov.tw
http://prtr.epa.gov.tw/Penalty/Statistics (offline)
日月光 sample:

污染情況與即時監測：

Tools / Code:

http://docs.casperjs.org/en/latest/quickstart.html#now-let-s-scrape-google

GIT Repo:

https://github.com/swilsonian/pollutionscraper

Questions:

How to get keywords to search for to our script - pass in a textfile with one keyword per line?
How to prevent duplicate data - for example, one keyword in the keywords textfile is "日月光" and another is "新日月光"?
- This will cause duplicate rows in our csv
- We could keep track of Entity IDs for factories in a hash table - each factory has an Entity ID
- Before opening a factory page, check if that factory’s entity ID is in our list -- If it is, don’t open the factory page again
Besides the one table, What other data needs to be retrieved from each factory page?
Should we add a CSV row for a factory that doesn’t have any table data?
How many CSV files do we want to create? One per keyword, or one per search?
How do the CSV(s) we create get uploaded to the final storage repo?