INDEX
Explanations
phrases related to legal and political concepts
New Auto-Interp
Negative Logits
ugins
-0.17
sWith
-0.17
rumpe
-0.17
ød
-0.16
ekil
-0.15
ukkit
-0.14
ael
-0.14
epend
-0.14
inks
-0.14
beurette
-0.14
POSITIVE LOGITS
till
0.21
henne
0.20
ur
0.19
på
0.19
att
0.18
ner
0.18
sig
0.18
emot
0.17
mot
0.17
iv
0.16
Activations Density 0.046%