INDEX
Explanations
mentions of significant or high-impact events
New Auto-Interp
Negative Logits
which
-0.28
,
-0.25
otherwise
-0.25
or
-0.22
which
-0.22
otherwise
-0.20
thereby
-0.20
and
-0.20
thus
-0.20
but
-0.19
POSITIVE LOGITS
there
0.28
it
0.22
there
0.22
if
0.21
they
0.20
çͱäºİ
0.20
we
0.20
if
0.20
they
0.19
we
0.19
Activations Density 0.476%