INDEX
Explanations
phrases regarding human actions and interactions
New Auto-Interp
Negative Logits
inho
-0.16
ama
-0.14
Äįast
-0.14
ãģ¤ãģ¶
-0.14
ÑĤебÑı
-0.13
ssp
-0.13
senin
-0.13
ãĢĤä½ł
-0.13
ÑĤебе
-0.13
hạ
-0.13
POSITIVE LOGITS
Mr
1.35
Mr
1.19
Ms
0.97
mr
0.90
Mrs
0.81
Ms
0.79
mr
0.68
_mr
0.68
MR
0.66
Mrs
0.66
Activations Density 0.366%