INDEX
Explanations
phrases expressing critique or strong opinions
phrases indicating the concept of being "out."
New Auto-Interp
Negative Logits
sembly
-0.88
GOODMAN
-0.74
odies
-0.66
Users
-0.63
Emin
-0.63
Funds
-0.63
aders
-0.62
colleg
-0.61
Yard
-0.60
plings
-0.59
POSITIVE LOGITS
edly
0.73
rejection
0.71
stand
0.68
Repe
0.68
lined
0.67
lie
0.66
Ign
0.65
Latin
0.65
Country
0.64
butt
0.63
Activations Density 0.076%