INDEX
Explanations
phrases that suggest causation or responsibility related to societal issues
New Auto-Interp
Negative Logits
implications
-0.17
DependencyProperty
-0.16
repercussions
-0.15
oise
-0.15
impact
-0.14
DEFINE
-0.14
าศ
-0.14
amba
-0.14
merce
-0.14
lip
-0.14
POSITIVE LOGITS
why
0.38
why
0.29
recent
0.26
Why
0.25
observed
0.25
为ä»Ģä¹Ī
0.24
success
0.24
Why
0.24
WHY
0.23
поÑĩемÑĥ
0.23
Activations Density 0.241%