INDEX
Explanations
proper nouns related to individuals or names
names of individuals, particularly those in the entertainment industry
New Auto-Interp
Negative Logits
ruary
-0.74
Akin
-0.71
theless
-0.69
orphans
-0.66
Moroccan
-0.62
representation
-0.62
Laos
-0.61
preservation
-0.61
hospitality
-0.59
Dak
-0.59
POSITIVE LOGITS
kov
0.81
andowski
0.74
angan
0.74
agan
0.73
agi
0.71
henko
0.70
oyal
0.70
Ĥ¬
0.69
emort
0.69
attr
0.69
Activations Density 0.161%