INDEX
Explanations
verbs expressing positive or negative evaluations of actions and statements
New Auto-Interp
Negative Logits
liter
-0.16
ados
-0.15
bet
-0.15
andre
-0.15
á»IJ
-0.14
ASC
-0.14
ÐŁÐ¾Ðº
-0.14
ober
-0.14
irror
-0.14
licer
-0.14
POSITIVE LOGITS
chio
0.16
ival
0.16
icle
0.15
ä¾
0.15
ics
0.14
igham
0.13
(pg
0.13
IVAL
0.13
pres
0.13
icles
0.13
Activations Density 0.073%