top of page

Red Teaming LLM's - Automated Pentesting Large Language Models

  • Esky Man
  • Mar 10, 2024
  • 1 min read

As more and more tools appear on Github relating to pen testing large language models I thought it would be good to test some of these tools to see how they work and to discover what the outputs are like.


So to kick things off let us take a look at Giskard. Giskard can be found here on Github:


This Python tool can be run in Jupyter Notebook instance and can test a handful of LLM vulnerabilities such as biased, hallucinations, prompt injection attacks, harmful outputs and more.


We begin by running Giskard in a Google colab environment and install dependencies required:

ree


ree

Setup your OpenAI API key (as we are using GPT3.5)

ree

Now lets run the model and test it

ree

We are using the IPCC's climate change report as the 'knowledge base'.

ree

Now we can scan the LLM

ree

The initial scan results show 1 issue of hallucination, and what's really cool is that Giskard shows you the prompt it used and explains the hallucination - information ideal for a Pentest report I think!

ree

Now we run the full report

ree

The full scan results report a number of errors, including prompt injections, stereotype issues, harmful information and more

ree

ree

ree


Giskard can also be run from a Huggingface Spaces page where it has GUI. I have not investigated this yet as I prefer interacting with the code directly. Either way the outputs from the test results are really interesting from a penetration test perspective.

Be in the Know

Add your email address below and I will send you interesting IT and cyber sec news from time to time.

Thanks for submitting!

  • Instagram
  • Twitter
  • LinkedIn
  • YouTube

© 2023 Dave TechGuy

bottom of page