As more and more tools appear on Github relating to pen testing large language models I thought it would be good to test some of these tools to see how they work and to discover what the outputs are like.
So to kick things off let us take a look at Giskard. Giskard can be found here on Github:
This Python tool can be run in Jupyter Notebook instance and can test a handful of LLM vulnerabilities such as biased, hallucinations, prompt injection attacks, harmful outputs and more.
We begin by running Giskard in a Google colab environment and install dependencies required:
Setup your OpenAI API key (as we are using GPT3.5)
Now lets run the model and test it
We are using the IPCC's climate change report as the 'knowledge base'.
Now we can scan the LLM
The initial scan results show 1 issue of hallucination, and what's really cool is that Giskard shows you the prompt it used and explains the hallucination - information ideal for a Pentest report I think!
Now we run the full report
The full scan results report a number of errors, including prompt injections, stereotype issues, harmful information and more
Giskard can also be run from a Huggingface Spaces page where it has GUI. I have not investigated this yet as I prefer interacting with the code directly. Either way the outputs from the test results are really interesting from a penetration test perspective.