INDEX
Explanations
references to political figures and events
Spanish words/phrases
New Auto-Interp
Negative Logits
.</
-0.71
$.
-0.70
.''
-0.65
".
-0.65
)."
-0.62
.).
-0.62
.�
-0.62
*.
-0.61
).
-0.61
..."
-0.59
POSITIVE LOGITS
meanwhile
0.75
ouple
0.53
ccording
0.51
spokesman
0.51
spokeswoman
0.48
udos
0.47
surprisingly
0.47
Variant
0.46
prisingly
0.44
tweeted
0.44
Activations Density 1.197%