Big (public) data is great, but sometimes a little bit of complementary *small* non public data is vital. I made a suggestion at the Data Science Conference last Sunday and Audrey Tang suggested that I dig a ditch for myself. So here you go.
(1) source --> secure channel
(2) --> secure database
(3) --> internal vetting with known information
(4) --> encrypted publication
(5) --> full disclosure
There are well-known technologies for (1), (2) .
(3) up to the implementation
(4) follow the idea of SCIpher, for example