INDEX
Explanations
references to legal or political issues
New Auto-Interp
Negative Logits
voks
-0.16
iyan
-0.15
cede
-0.15
bekl
-0.14
ragaz
-0.14
altar
-0.14
Wyn
-0.14
Andrews
-0.14
Pf
-0.14
Kinder
-0.14
POSITIVE LOGITS
og
0.22
och
0.19
på
0.17
Ã¥
0.17
aler
0.17
nr
0.17
igh
0.15
ø
0.15
.nr
0.15
isen
0.15
Activations Density 0.182%