INDEX
Explanations
references to the number four or phrases that include the word "four."
New Auto-Interp
Negative Logits
inous
-0.17
stead
-0.17
st
-0.17
xor
-0.15
spons
-0.15
em
-0.14
uras
-0.14
end
-0.14
abs
-0.14
chter
-0.14
POSITIVE LOGITS
Seasons
0.22
cade
0.20
-legged
0.20
fours
0.19
rier
0.19
corners
0.19
Leaf
0.18
ier
0.18
Winds
0.18
_season
0.18
Activations Density 0.043%