Swiss DoD research arm releases a report on LLMs in cyber-security (and cites WAN show)

Andrei Chiffa · March 24, 2023

Summary

Cyber-Defense campus, a research arm of the Swiss DoD in everything related to cyber-security and cyber-defense, has released a first version of its report on the impact Large Language Models could have on cyber-security. After an introduction of the tech behind LLM, reviewing current models, and citing some fundamental limitations, they list the main threats they see, with the ones most relevant to the LTT audience being:
- Phishing, especially when they have already compromised a system and can use Microsoft Office 365 Copilot
- Ability to search better and deeper and have summarization of findings
- Vulnerability of the code generated by them, unlikely to be detected by beginner coders most likely to use them to write code
- Private information leakage from interactions with models when they are learning from interactions
- Hijacking systems that are controlled by LLMs (a bit like SQL injections)

In the part where they review Bing Chat and GPT-4, they cite and link the Feb 10th WAN show with a timestamp as the source for their evaluation of abilities and the most likely architecture and training modes used to achieve it.

Quotes

Quote

Based on some public demos LinusTechTips [2023]<Link to WAN show>, in addition to being to perform search queries, BingGPT seems to be capable as well of:

• Perform image-to-text conversion (image object type, color, logo nature)

• Perform basic logic reasoning to split queries (bags of type X that will fit in a trunk of a car Y →size of bags of type X, size of car Y trunk)

• Perform basic logic reasoning to aggregate information acquired from separate queries (bags size along dimensions vs. trunk size along dimensions; similarity of bags sizes to objects for which there is a record of being put into trunk)

• Identification and summarization of customer feedback in a qualitative manner (recurrent points of dissatisfaction or satisfaction rather than a sentiment or a star rating alone)

• Requesting further refinement in case of queries allowing for multiple interpretations

• Explicitly identifying misspelled but semantically similar search terms, correcting and asking the user to clarify in case of ambiguity (Biomass → Bonemass; a videogame boss rather than a fuel or a mass of living organisms)

• Offering realistic, search-based scenarios for possible future outcomes regarding a specific domain, technology, or fiction franchise

• Potentially, parsing and interpreting sound and visuals of videos to provide a summary and integrate such a summary in a query response results.

My thoughts

Looks like a report that falls straight into the intersection of the currently two hot topics for LTT over the last couple of months. I would have been curious to hear Luke's take on the subject.

Also, watching professionally and with straight face Linus and Luke going nuts live for almost an hour, Two Soyjacks style, might have been a new experience for some of them.

Sources

https://arxiv.org/abs/2303.12132