INDEX
Explanations
statements or claims about surveillance and authority
New Auto-Interp
Negative Logits
obus
-0.17
zburg
-0.17
uis
-0.16
tte
-0.15
nave
-0.15
Slinky
-0.15
ĩĮ
-0.15
erland
-0.15
olds
-0.15
endale
-0.15
POSITIVE LOGITS
Cassidy
0.16
Sesso
0.15
Pacific
0.14
unch
0.14
íĥ
0.14
Tic
0.14
Casinos
0.13
ument
0.13
(http
0.13
DataSet
0.13
Activations Density 0.201%