INDEX
Explanations
instances of personal pronouns and expressions of uncertainty or lack of knowledge
New Auto-Interp
Negative Logits
utdown
-0.15
_almost
-0.15
ura
-0.14
åŀ
-0.14
ugins
-0.14
ymm
-0.14
oubles
-0.14
resi
-0.13
=title
-0.13
Jag
-0.13
POSITIVE LOGITS
ä¸įçŁ¥éģĵ
0.36
unknown
0.35
descon
0.34
unknown
0.33
don
0.31
Unknown
0.31
Unknown
0.30
UNKNOWN
0.30
unsure
0.29
_unknown
0.28
Activations Density 0.231%