INDEX
Explanations
friends and close relationships
New Auto-Interp
Negative Logits
ص
0.49
所有
0.45
《
0.43
UD
0.43
on
0.41
ب
0.40
Field
0.39
on
0.39
it
0.39
AB
0.39
POSITIVE LOGITS
👬
0.59
👭
0.59
輩
0.54
ktorí
0.53
spouse
0.51
👫
0.50
fiancé
0.49
hood
0.49
الاعزاء
0.49
colleague
0.48
Activations Density 0.082%