INDEX
Explanations
the presence of the word "Fox" in various contexts
New Auto-Interp
Negative Logits
ninger
-0.15
\CMS
-0.15
bove
-0.15
exus
-0.15
еÑĢж
-0.15
antu
-0.15
ocal
-0.15
ignum
-0.15
ered
-0.15
abant
-0.15
POSITIVE LOGITS
conn
0.20
xy
0.19
worthy
0.18
boro
0.18
croft
0.17
ionale
0.16
enberg
0.16
(es
0.15
ional
0.15
spl
0.15
Activations Density 0.009%