INDEX
Explanations
phrases related to social and political actions or statements
repetitive phrases or expressions highlighting quantifiers and negations
New Auto-Interp
Negative Logits
Redditor
-0.79
Also
-0.76
Alternatively
-0.70
MAN
-0.67
Additionally
-0.66
also
-0.66
additionally
-0.65
Nare
-0.64
Also
-0.63
rm
-0.63
POSITIVE LOGITS
whatever
0.93
etc
0.86
EntityItem
0.75
clot
0.68
etc
0.66
cknow
0.66
decency
0.65
blah
0.62
sensit
0.62
dq
0.61
Activations Density 0.260%