INDEX
Explanations
proper nouns related to individuals, particularly names
references to specific individuals and brands
New Auto-Interp
Negative Logits
ences
-0.82
ential
-0.81
enced
-0.68
--------------------------------------------------------
-0.67
LINE
-0.64
tein
-0.63
swick
-0.63
body
-0.63
PATH
-0.63
ilee
-0.62
POSITIVE LOGITS
Mats
1.26
ushima
1.09
ura
1.00
mats
0.98
wana
0.79
misunder
0.78
awan
0.77
uchin
0.76
aido
0.75
Kats
0.73
Activations Density 0.009%