INDEX
Explanations
security-related terms and concepts, such as defense, attack, firewall, and mitigation
New Auto-Interp
Negative Logits
lore
-0.68
zos
-0.66
hall
-0.63
mitt
-0.62
Bee
-0.61
liam
-0.60
Parenthood
-0.58
wheel
-0.57
Kinn
-0.57
cow
-0.56
POSITIVE LOGITS
against
1.13
Against
0.93
against
0.93
iveness
0.92
ously
0.91
ively
0.88
perimeter
0.79
atively
0.78
folios
0.78
heed
0.77
Activations Density 2.130%