INDEX
Explanations
instances of emotional and physical pain descriptors
New Auto-Interp
Negative Logits
ÄĻd
-0.18
aim
-0.16
ivot
-0.16
ungan
-0.15
omed
-0.15
eren
-0.15
enek
-0.15
rien
-0.15
indeki
-0.15
Hüs
-0.15
POSITIVE LOGITS
Torrent
0.15
yper
0.15
nett
0.15
alt
0.14
_CTL
0.14
Netz
0.14
Ott
0.14
alt
0.14
ZR
0.14
æº
0.14
Activations Density 0.012%