INDEX
Explanations
references to political figures or affiliations
New Auto-Interp
Negative Logits
-0.66
ataupun
-0.55
,
-0.52
oppure
-0.47
iż
-0.45
besitzt
-0.45
mempunyai
-0.44
mengenai
-0.43
-0.43
...
-0.43
POSITIVE LOGITS
XNUMX
1.41
myſelf
1.40
NUMX
1.36
purpoſe
1.30
itſelf
1.24
Monfieur
1.23
Jefus
1.23
pleaſure
1.22
whoſe
1.22
becauſe
1.20
Activations Density 0.038%