INDEX
Explanations
the word "both" in various contexts
New Auto-Interp
Negative Logits
xus
-0.87
frac
-0.71
uez
-0.71
Caption
-0.71
hibition
-0.66
kamp
-0.65
ventus
-0.65
regation
-0.62
ilit
-0.61
nce
-0.61
POSITIVE LOGITS
sexes
1.36
genders
1.13
sides
1.09
halves
0.83
domestically
0.83
verbally
0.80
orally
0.80
individually
0.79
internally
0.75
physically
0.72
Activations Density 0.033%