INDEX
Explanations
instances of words indicating types of organisms or species
New Auto-Interp
Negative Logits
zem
-0.16
cts
-0.15
ontent
-0.15
edn
-0.14
mı
-0.14
ommen
-0.14
hdr
-0.14
gii
-0.14
ACT
-0.13
Ñĸв
-0.13
POSITIVE LOGITS
affer
0.15
<dim
0.15
å±ĭ
0.15
orca
0.14
665
0.14
eff
0.14
ourg
0.14
fist
0.14
igan
0.14
ensed
0.13
Activations Density 0.003%