INDEX
Explanations
negations or contradictions
negations or phrases indicating the absence of something
New Auto-Interp
Negative Logits
éĥ
-0.75
å¥
-0.74
æ©
-0.73
ç·
-0.70
LG
-0.68
ãĤ¼
-0.67
çļ
-0.66
Inventory
-0.65
ixel
-0.64
åº
-0.64
POSITIVE LOGITS
hin
1.18
epad
1.17
icably
1.17
necessarily
1.16
icable
1.14
exactly
1.00
bothered
0.94
gonna
0.94
eworthy
0.93
orious
0.93
Activations Density 0.113%