INDEX
Explanations
words related to disapproval or criticism
instances of the word "rep" in various contexts related to reputation, representation, or reparations
New Auto-Interp
Negative Logits
glers
-0.92
ppo
-0.83
ERY
-0.77
BuyableInstoreAndOnline
-0.77
Abyss
-0.74
Cage
-0.73
Ducks
-0.72
Bruins
-0.71
ggle
-0.69
gers
-0.68
POSITIVE LOGITS
utations
1.28
rint
1.06
arations
1.03
rieve
1.00
ainted
0.99
utation
0.97
rehens
0.95
onse
0.95
uted
0.93
ublic
0.93
Activations Density 0.010%