INDEX
Explanations
phrases involving actions related to individuals and their interactions
New Auto-Interp
Negative Logits
itſelf
-0.88
Efq
-0.80
(\<
-0.73
iſt
-0.72
Reſ
-0.71
Diſ
-0.71
stiefel
-0.70
Majefty
-0.70
Beſ
-0.69
reaſon
-0.69
POSITIVE LOGITS
را
0.74
ceğini
0.73
devamını
0.70
ığını
0.70
larını
0.69
音を
0.69
MENAFN
0.68
lerini
0.67
meyi
0.67
ätä
0.66
Activations Density 0.052%