INDEX
Explanations
references to individuals and their roles or identities
New Auto-Interp
Negative Logits
ovice
-0.18
796
-0.15
occo
-0.15
ÙİØŃ
-0.15
olie
-0.15
neys
-0.15
iffer
-0.14
Pitch
-0.14
itech
-0.14
Techn
-0.14
POSITIVE LOGITS
Literature
0.42
literature
0.41
literary
0.36
Liter
0.36
liter
0.36
-liter
0.35
liter
0.34
Literary
0.31
æĸĩåѦ
0.31
novels
0.30
Activations Density 0.274%