We demonstrate a potential use of our FoodKG for answering natural language questions over knowledge graphs, aka, knowledge base question answering (KBQA). Given questions in natural language such as "what Indian dishes can I make with chicken and garlic?", our goal here is to automatically find answers from the FoodKG. We believe this is a natural way to access a large-scale knowledge graph, especially for non-experts users. Moreover, in this way, our FoodKG is able to benefit users by providing nutrition facts of ingredients and diverse recipe options in a user-friendly way. To this end, we build this application which is Answering Natural Language Questions over FoodKG. We first create a synthetic Q&A dataset based on our FoodKG using a set of manually designed question templates. Then we train a state-of-the-art neural network-based KBQA model called BAMnet on the Q&A dataset. After training the KBQA model, it is supposed to answer similar natural language questions based on the FoodKG.
virtualenv
to manage your python packages and environments.
Please take the following steps to create a python virtual environment.
pip install virtualenv
.virtualenv venv
.source venv/bin/activate
.pip install -r requirements.txt
.We assume you have already loaded the FoodKG into Blazegraph.
If not, please follow the instructions in the User Guide to download, install, and load the FoodKG RDF data in to the Blazegraph endpoint on your system.
Please also confirm that the variable USE_ENDPOINT_URL
hard-coded in data_builder/src/config/data_config.py matches the URL and namespace of your Blazegraph instance.
python usda.py -o [qas_dir]
python recipe.py -o [qas_dir]
python filterout_recipe.py -recipe [qas_dir/recipe_kg.json] -o [qas_dir/sampled_recipe_kg.json] -max_dish_count_per_tag 2000
cat [qas_dir/usda_subgraphs.json] [qas_dir/sampled_recipe_kg.json] > [qas_dir/foodkg.json]
python generate_qa.py -usda [qas_dir/usda_subgraphs.json] -recipe [qas_dir/sampled_recipe_kg.json] -o [qas_dir] -sampling_prob 0.05 -num_qas_per_tag 20
python build_all_data.py -data_dir [qas_dir] -kb_path [qas_dir/foodkg.json] -out_dir [qas_dir]
In the message printed out, your will see some data statistics such as vocab_size, num_ent_types, num_relations. These numbers will be used later when modifying the config file.
python -m gensim.scripts.glove2word2vec --input glove.840B.300d.txt --output glove.840B.300d.w2v
python build_pretrained_w2v.py -emb glove.840B.300d.w2v -data_dir [qas_dir] -out [qas_dir/glove_pretrained_300d_w2v.npy] -emb_size 300
python train.py -config config/kbqa.yml
python run_online.py -config config/kbqa.yml