INDEX
Explanations
the last names of celebrities or famous individuals
specific names and proper nouns, particularly those related to people or characters
New Auto-Interp
Negative Logits
ãĤ¬
-0.84
CAP
-0.77
ESA
-0.74
arrang
-0.72
poss
-0.70
Manit
-0.70
DIRECT
-0.70
Cort
-0.70
Equip
-0.70
helicop
-0.69
POSITIVE LOGITS
y
1.46
ys
1.31
yon
1.30
yles
1.29
yx
1.28
Y
1.27
yl
1.25
yan
1.21
yg
1.19
ym
1.16
Activations Density 0.271%