INDEX
Explanations
instances of specific letters followed by numbers, indicating a pattern related to location or categorization
New Auto-Interp
Negative Logits
aptop
-0.21
ikes
-0.19
ots
-0.19
ike
-0.17
ocking
-0.17
inker
-0.17
abs
-0.16
ance
-0.16
isten
-0.16
ife
-0.15
POSITIVE LOGITS
ichten
0.20
om
0.18
usat
0.18
lund
0.17
el
0.17
orraine
0.17
اÙĦØ©
0.16
.editor
0.16
ucc
0.16
assed
0.16
Activations Density 0.034%