INDEX
Explanations
expressions related to experiencing pain or distress
New Auto-Interp
Negative Logits
ehler
-0.19
eb
-0.17
vivo
-0.16
trinsic
-0.15
igham
-0.15
ollapsed
-0.15
izable
-0.15
orus
-0.14
author
-0.14
asant
-0.14
POSITIVE LOGITS
ityEngine
0.17
IDA
0.16
flate
0.16
ERSHEY
0.15
(Status
0.15
боÑĤ
0.15
zeug
0.15
prob
0.15
zcze
0.14
illance
0.14
Activations Density 0.021%