We combine multiple sources of data into a single cohesive knowledge graph, forming linkages to relate similar concepts.
Recipes can be derived from a multitude of sources, such as books, websites, and structured datasets. For the purposes of the publication dataset, we chose to use a collection of recipes gathered by the authors of and used in the making of the Im2Recipe project.
Nutrient information can be found in great quantities for a variety of foods. We chose to source our data from the USDA. To bring the data they provide into the knowledge graph, we took advantage of Semantic Data Dictionaries, an RPI project. The files used in the Semantic Data Dictionary process is available in this folder. The dictionary mapping file specifies all the linkages made to external ontologies, such as FoodOn, Units Ontology etc.
We make available a sample of the FoodKG (USDA mappings) that were created using the Semantc Data Dictionary process.
You will need to manually acquire the following:
To build, Python 3.7 is required as some of the prerequisite packages depend on the bundled packages with Python 3.7.
After cloning the repository, detailed instructions for reproduction are available under the /src directory. A broad overview follows:
The following two files are required from Recipe1M:
Other public data sources (e.g., USDA, FoodOn) are downloaded automatically by the script.
The final output comprises the serialized RDF data iles comprising the FoodKG:
These files can be loaded into a graph database like BlazeGraph for executing the natural language queries.