INDEX
Explanations
words related to law, government, and security
letters or characters that appear in various contexts in the text
New Auto-Interp
Negative Logits
Cot
-0.78
Brach
-0.77
Spa
-0.73
Watkins
-0.72
Leopard
-0.71
Gaul
-0.69
Tactics
-0.69
Kau
-0.69
Jagu
-0.69
KP
-0.68
POSITIVE LOGITS
vernment
1.05
undred
0.99
actory
0.97
terday
0.95
ancial
0.93
actly
0.93
onymous
0.89
udder
0.87
iable
0.86
iversity
0.86
Activations Density 0.159%