INDEX
Explanations
expressions of kindness and hospitality in social interactions
New Auto-Interp
Negative Logits
partagé
-0.55
enumi
-0.53
архивлан
-0.52
doskon
-0.52
disambiguazione
-0.51
Escolar
-0.49
ponym
-0.49
marriages
-0.49
ärin
-0.47
pamię
-0.47
POSITIVE LOGITS
welcoming
1.07
welcome
0.88
welcomes
0.85
inviting
0.81
welcomed
0.77
invitation
0.76
hostility
0.75
ramah
0.75
hostile
0.75
Welcome
0.75
Activations Density 0.090%