INDEX
Explanations
phrases that indicate causes and reasons for various situations or events
New Auto-Interp
Negative Logits
olt
-0.16
atis
-0.14
çīĪ
-0.13
oltip
-0.13
merce
-0.13
ipsis
-0.13
ÅĻe
-0.13
.Actor
-0.13
.less
-0.13
px
-0.13
POSITIVE LOGITS
why
0.26
why
0.20
success
0.19
Why
0.17
为ä»Ģä¹Ī
0.16
Why
0.16
observed
0.16
recent
0.15
WHY
0.15
ÙĨب
0.15
Activations Density 0.100%