INDEX
Explanations
explicit expressions of strong opinions
expressions of strong sentiments or reactions
New Auto-Interp
Negative Logits
edIn
-0.74
Wik
-0.65
_-
-0.63
Ct
-0.63
indle
-0.62
iba
-0.61
yrinth
-0.61
Interface
-0.59
Cla
-0.59
anian
-0.58
POSITIVE LOGITS
impression
0.84
advice
0.79
icum
0.79
Rosenstein
0.70
onsense
0.69
counsel
0.67
summary
0.66
chance
0.66
arsh
0.65
amnesty
0.64
Activations Density 0.213%