INDEX
Explanations
actions and their effects in interpersonal and social contexts
New Auto-Interp
Negative Logits
ibo
-0.16
iyel
-0.16
acht
-0.15
ickey
-0.15
Pazar
-0.15
.base
-0.14
onders
-0.14
еÑĢг
-0.14
hv
-0.14
athon
-0.14
POSITIVE LOGITS
differently
0.23
vlastnÄĽ
0.17
obre
0.17
differs
0.17
вообÑīе
0.16
differ
0.16
/react
0.16
obra
0.15
iffin
0.15
ombres
0.15
Activations Density 0.175%