INDEX
Explanations
references to skepticism and critical thinking in discourse
New Auto-Interp
Negative Logits
etta
-0.16
ÏĥÏĦά
-0.16
Johns
-0.13
Codec
-0.13
README
-0.13
اÙĦØ¥ÙĨجÙĦÙĬزÙĬØ©
-0.13
æ¦Ĥ
-0.13
te
-0.13
енз
-0.13
stro
-0.12
POSITIVE LOGITS
emphasis
0.48
emphasis
0.45
ital
0.35
bold
0.31
phasis
0.30
emphasize
0.29
emph
0.29
Em
0.29
Ital
0.28
emphasized
0.28
Activations Density 0.037%