INDEX
Explanations
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
ÅĻet
-0.16
itus
-0.15
ensa
-0.15
antro
-0.15
isecond
-0.15
rette
-0.15
ustum
-0.14
robat
-0.14
reon
-0.14
iflower
-0.14
POSITIVE LOGITS
son
0.15
"
0.14
ÑģÑĤÑĭ
0.14
“
0.13
alk
0.13
drs
0.13
ba
0.13
Peters
0.13
imb
0.13
'
0.13
Activations Density 0.396%