INDEX
Explanations
actions related to requests and obligations in interpersonal contexts
New Auto-Interp
Negative Logits
wide
-0.17
urum
-0.15
Bye
-0.15
itself
-0.15
uem
-0.15
лоп
-0.15
Hab
-0.14
ög
-0.14
otts
-0.14
atte
-0.14
POSITIVE LOGITS
bợi
0.16
одаÑĢ
0.16
!=(
0.15
.freeze
0.14
by
0.14
Spectrum
0.14
LD
0.14
oleh
0.14
icular
0.13
ihad
0.13
Activations Density 0.352%