INDEX
Explanations
political terms and affiliations
New Auto-Interp
Negative Logits
oval
-0.60
achine
-0.57
eco
-0.55
Pacific
-0.55
retro
-0.54
reluct
-0.54
revenge
-0.53
FI
-0.53
vengeance
-0.53
unbeliev
-0.51
POSITIVE LOGITS
().
0.83
attRot
0.79
counterparts
0.78
anymore
0.77
*.
0.76
!.
0.73
":[
0.73
.
0.73
.'
0.71
existed
0.71
Activations Density 0.268%