It depends entirely on what you are trying to accomplish. If you a programming straight chatbot systems, accuracy may be your best metric. If you have moved to agent-oriented systems, a traditional llm will generally not have the ability to perform the complex chain of thought reasoning across multiple dimensions/steps.

Responses (1)