INDEX
Explanations
contractions and negations in statements
New Auto-Interp
Negative Logits
Ø®ÛĮ
-0.16
361
-0.15
fuck
-0.15
обÑģÑĤ
-0.15
Fucking
-0.15
ITTE
-0.14
.metro
-0.14
TestFixture
-0.14
530
-0.14
nig
-0.14
POSITIVE LOGITS
.uc
0.17
Dil
0.17
št
0.16
andas
0.15
Pur
0.15
رÙĪØ¯
0.14
Ultimate
0.14
hr
0.13
itional
0.13
azing
0.13
Activations Density 0.001%