INDEX
Explanations
the word "any" followed by a number
the occurrence of the word "any."
New Auto-Interp
Negative Logits
seless
-0.77
rex
-0.74
plex
-0.73
rox
-0.68
ip
-0.67
gal
-0.66
ulo
-0.66
grad
-0.64
yrinth
-0.64
itals
-0.64
POSITIVE LOGITS
THING
1.30
WHERE
1.04
significant
0.91
meaningful
0.88
place
0.86
particular
0.84
ONE
0.83
additional
0.82
longer
0.81
ones
0.80
Activations Density 0.066%