INDEX
Explanations
words related to criticism or judgment
phrases that express different types or kinds of things
New Auto-Interp
Negative Logits
dn
-0.89
gor
-0.79
ĸļ
-0.77
UF
-0.77
idelines
-0.76
alf
-0.75
в
-0.75
ï¸
-0.74
none
-0.74
APS
-0.73
POSITIVE LOGITS
thing
1.56
stuff
1.12
behavior
1.05
crap
1.01
mischief
0.98
behaviour
0.98
shenanigans
0.97
situation
0.96
mentality
0.95
activity
0.93
Activations Density 0.053%