INDEX
Explanations
statements made by individuals
New Auto-Interp
Negative Logits
inary
-0.71
rats
-0.64
nerg
-0.62
caliber
-0.62
atible
-0.62
Justice
-0.61
OTUS
-0.61
totality
-0.59
swick
-0.59
Organization
-0.57
POSITIVE LOGITS
hiba
0.81
ansky
0.80
ynthesis
0.77
ometimes
0.75
advertisement
0.74
omething
0.71
kowski
0.68
confidently
0.67
rhet
0.67
é¾
0.67
Activations Density 0.043%