INDEX
Explanations
statements indicating existence or descriptions of conditions
New Auto-Interp
Negative Logits
this
-0.15
soever
-0.14
vis
-0.14
uba
-0.14
379
-0.14
opause
-0.14
there
-0.14
port
-0.14
there
-0.14
763
-0.13
POSITIVE LOGITS
how
0.23
where
0.21
why
0.21
happening
0.19
how
0.16
AFX
0.16
aran
0.15
why
0.15
happ
0.15
supposed
0.15
Activations Density 0.108%