INDEX
Explanations
references to winning or victory
terminology related to victories and wins
New Auto-Interp
Negative Logits
intestine
-0.71
encyclopedia
-0.70
bowel
-0.70
erity
-0.68
Uz
-0.68
incorpor
-0.65
notor
-0.65
trave
-0.65
pronouns
-0.62
umn
-0.62
POSITIVE LOGITS
nings
1.22
now
0.95
iem
0.95
ners
0.85
athon
0.82
throp
0.80
iors
0.79
ces
0.78
hardt
0.78
win
0.77
Activations Density 0.023%