INDEX
Explanations
adjectives or nouns related to things that are highly successful or have a strong impact
New Auto-Interp
Negative Logits
ploma
-0.70
plane
-0.67
planes
-0.67
yg
-0.66
loads
-0.65
PART
-0.65
Hop
-0.65
IGH
-0.64
ppers
-0.63
worthiness
-0.63
POSITIVE LOGITS
altru
1.09
deterrent
0.84
coping
0.82
atives
0.80
ative
0.79
aneously
0.77
contraception
0.74
antid
0.73
policing
0.72
iating
0.71
Activations Density 0.088%