INDEX
Explanations
phrases expressing negation or contradiction
negations or expressions of denial
New Auto-Interp
Negative Logits
éĥ
-0.80
interstitial
-0.75
éĹĺ
-0.71
å¥
-0.71
itech
-0.70
lined
-0.68
è»
-0.66
ãĤ¼ãĤ¦ãĤ¹
-0.65
Steps
-0.64
arsen
-0.63
POSITIVE LOGITS
necessarily
1.60
icably
1.21
withstanding
1.12
exactly
1.08
icable
1.06
yet
0.96
orious
0.92
always
0.91
necess
0.90
entirely
0.87
Activations Density 0.220%