INDEX
Explanations
expressions related to politics and opinions
expressions of conflict and opposition
New Auto-Interp
Negative Logits
âĢº
-0.68
)",
-0.67
Enlarge
-0.66
Previously
-0.64
"),
-0.63
),"
-0.63
")
-0.63
Initially
-0.62
","
-0.62
earable
-0.60
POSITIVE LOGITS
coward
0.78
goddamn
0.74
damned
0.74
patri
0.72
Genocide
0.72
hypocritical
0.71
dehuman
0.71
hypocrisy
0.71
fools
0.70
fucking
0.70
Activations Density 2.242%