【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
1. 1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
ToolLLM: Facilitating Large Language Models to
Master 16000+ Real-world APIs
Jeong Seong Cheol, M1, Matsuo Lab, The University of Tokyo
15. 実験
ToolBenchを訓練?テストデータにわけ,テストデータを使ってToolLLaMAの汎化能?を測定.その際,
3つのレベルで評価
1. Inst.: unseen instructions for the same set of tools in the training data
2. Tool: unseen tools that belong to the same (seen) category of the tools in the training data
3. Cat.: unseen tools that belong to a different (unseen) category of tools in the training data
3つのシナリオで評価
1. single-tool instructions (I1):APIを1つ使うinstruction
2. intra-category multi-tool instructions (I2):同じカテゴリからAPI2~5個使うinstruction
3. intracollection multi-tool instructions (I3) :同じコレクションからAPI2~5個使うinstruction
Baseline
? VicunaとAlpacaに洗練されたプロンプトエンジニアリングしたもの(We conduct sophisticated prompt
engineering for both models to elicit the best of their tool-use abilities)
? ChatGPT(teacher model)とText-Davinci-003
VicunaとAlpacaは貧弱なオープ
ンソースLLMがAPIエラー起こ
しまくってる?から測定不能だ
と思われる