INDEX
Explanations
expressions related to friendship and close relationships
New Auto-Interp
Negative Logits
urement
-0.15
айд
-0.15
elter
-0.15
ushima
-0.14
ryo
-0.13
ddit
-0.13
factor
-0.13
oleÄį
-0.13
fusion
-0.13
oub
-0.13
POSITIVE LOGITS
friends
0.77
friend
0.69
FRIEND
0.65
Friends
0.65
friends
0.62
æľĭåıĭ
0.60
Friends
0.60
Friend
0.57
friend
0.56
riends
0.53
Activations Density 0.286%