INDEX
Explanations
proper nouns, particularly names of individuals
New Auto-Interp
Negative Logits
acle
-0.18
.datab
-0.18
ismet
-0.16
lapse
-0.16
046
-0.15
oka
-0.14
prova
-0.14
oken
-0.14
uka
-0.14
ãĥ¬ãĥĥãĥĪ
-0.14
POSITIVE LOGITS
sam
0.23
Sam
0.21
Sam
0.16
SAM
0.16
Bulk
0.15
Samantha
0.15
ساÙħ
0.15
SAM
0.15
mic
0.15
iform
0.14
Activations Density 0.016%