Proprietary Systems =================== ``etalon`` can benchmark the performance of LLM Inference Systems that are exposed as public APIs. The following sections describe how to benchmark these systems. .. note:: Custom tokenizer corresponding to the model is fetched from Hugging Face hub. Make sure you have access to the model and are logged in to Hugging Face. Check :ref:`huggingface_setup` for more details. Export API Key and URL ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: shell export OPENAI_API_BASE=https://api.endpoints.anyscale.com/v1 export OPENAI_API_KEY=secret_abcdefg Running Benchmark ~~~~~~~~~~~~~~~~~ .. code-block:: shell python -m etalon.run_benchmark \ --model "meta-llama/Meta-Llama-3-8B-Instruct" \ --max-num-completed-requests 20 \ --request-interval-generator-provider "gamma" \ --request-length-generator-provider "zipf" \ --request-generator-max-tokens 8192 \ --output-dir "results" Be sure to update ``--model`` flag to the model used in the proprietary system. .. note:: ``etalon`` supports different generator providers for request interval and request length. For more details, refer to :doc:`../guides/request_generator_providers`. .. _wandb_args_proprietary_systems: Specifying wandb args [Optional] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Optionally, you can also specify the following arguments to log results to wandb: .. code-block:: shell --should-write-metrics \ --wandb-project Project \ --wandb-group Group \ --wandb-run-name Run Other Arguments ^^^^^^^^^^^^^^^ There are many more arguments for running benchmark, run the following to know more: .. code-block:: shell python -m etalon.run_benchmark -h