INDEX
Explanations
the presence of specific names or references, particularly those related to individuals or authors
New Auto-Interp
Negative Logits
ariat
-0.16
arters
-0.15
ius
-0.15
grand
-0.15
æĿ¡
-0.15
uro
-0.14
grand
-0.14
g
-0.14
egr
-0.14
antry
-0.14
POSITIVE LOGITS
asmus
0.23
ector
0.21
antz
0.20
cola
0.19
oded
0.18
atz
0.17
flater
0.16
udit
0.16
hyth
0.16
ifes
0.16
Activations Density 0.031%