INDEX
Explanations
specific names or terms related to individuals, particularly political figures or leaders
New Auto-Interp
Negative Logits
yo
-0.19
tainment
-0.15
Lair
-0.15
ла
-0.15
Yates
-0.14
obox
-0.14
ourcem
-0.14
ExecutionContext
-0.14
айÑĤ
-0.14
¯
-0.14
POSITIVE LOGITS
ût
0.23
erg
0.18
osy
0.16
irse
0.15
BAB
0.15
xygen
0.15
Insensitive
0.15
оÑģÑĮ
0.15
anan
0.14
chy
0.14
Activations Density 0.022%