INDEX
Explanations
proper nouns, especially names of individuals
names and terms associated with individuals or entities
New Auto-Interp
Negative Logits
ãĤ¤ãĥĪ
-0.70
CHA
-0.66
apest
-0.64
»Ĵ
-0.64
vertisement
-0.62
é¾
-0.61
ãĥīãĥ©ãĤ´ãĥ³
-0.60
imaginary
-0.60
SIGN
-0.60
imitation
-0.60
POSITIVE LOGITS
eper
0.74
wed
0.70
aults
0.69
ecast
0.68
Vul
0.63
fried
0.63
Finn
0.63
abo
0.63
pard
0.63
ember
0.61
Activations Density 0.082%