Is the local training set representative of the provisory and final datasets?
It seems as the provisory dataset is much larger than the local one, given that the baseline takes much more time. Although I could be wrong and just be due the different platforms where the code is ran.
And, how much one should rely on the provisory score to choose a solver?
Thanks for the question.
1. The demo dataset included in the starting kit is not public/feedback/final dataset, it is used just for sanity check. You might need to download public datasets from the drive link. Public datasets might be a little smaller in scale. They serve as a reference for local development.
2. As AutoML solutions, it is encouraged to improve your solver's generalization on public & feedback datasets in order to perform well in final phase
KDD Cup 2020 AutoGraph Organizing Team