INDEX
Explanations
specific names, possibly of individuals or organizations involved in a narrative
New Auto-Interp
Negative Logits
ulative
-0.16
ldb
-0.15
enci
-0.14
frei
-0.14
ÏĦÏħ
-0.14
.rd
-0.14
bbing
-0.13
anik
-0.13
nett
-0.13
kB
-0.13
POSITIVE LOGITS
ognito
0.21
(Int
0.21
quo
0.17
gın
0.17
xs
0.16
spell
0.15
upro
0.15
amic
0.15
nat
0.15
ÅĽÄĩ
0.15
Activations Density 0.146%