INDEX
Explanations
phrases indicating negative extreme scenarios or outcomes
New Auto-Interp
Negative Logits
odd
-0.18
opportunity
-0.17
antar
-0.17
441
-0.17
alphabet
-0.15
ees
-0.14
opportunities
-0.14
zos
-0.14
Lod
-0.14
er
-0.14
POSITIVE LOGITS
-case
0.39
Case
0.25
case
0.25
case
0.23
Case
0.21
_case
0.21
CASE
0.18
(case
0.18
case
0.18
offender
0.17
Activations Density 0.017%