INDEX
Explanations
expressions emphasizing intensity or emphasis
emphatic modifiers that indicate a strong degree of certainty or absolute statements
New Auto-Interp
Negative Logits
yip
-0.91
rers
-0.80
anwhile
-0.79
olester
-0.76
ourses
-0.72
sburg
-0.72
*/(
-0.72
ulators
-0.72
EStream
-0.71
achu
-0.70
POSITIVE LOGITS
understandable
1.04
predictable
0.88
justified
0.87
unrelated
0.85
unnecessary
0.82
unacceptable
0.81
antit
0.80
different
0.78
coinc
0.78
legit
0.78
Activations Density 0.046%