INDEX
Explanations
phrases indicating someone as part of a group or team
New Auto-Interp
Negative Logits
zeb
-0.72
agre
-0.72
chall
-0.66
ption
-0.64
lapt
-0.63
diversion
-0.62
miscarriage
-0.61
mathemat
-0.61
anytime
-0.61
streng
-0.60
POSITIVE LOGITS
few
0.80
many
0.78
those
0.75
those
0.75
several
0.73
abet
0.69
wealthiest
0.68
dozens
0.68
amen
0.66
many
0.66
Activations Density 0.069%