INDEX
Explanations
specific words related to reasons or causes
New Auto-Interp
Negative Logits
rolling
-0.68
in
-0.65
,
-0.65
or
-0.62
and
-0.62
heavily
-0.62
management
-0.62
shaped
-0.61
mixed
-0.61
Media
-0.60
POSITIVE LOGITS
because
1.97
until
1.93
when
1.91
without
1.87
instead
1.86
against
1.85
along
1.82
during
1.76
unless
1.76
among
1.73
Activations Density 0.036%