INDEX
Explanations
phrases that discuss causes and their effects across various contexts, particularly in relation to health and societal issues
New Auto-Interp
Negative Logits
ActivityIndicatorView
-0.15
yang
-0.15
amba
-0.14
illow
-0.14
lsx
-0.14
wen
-0.14
uple
-0.14
olt
-0.14
indem
-0.14
atis
-0.13
POSITIVE LOGITS
why
0.33
why
0.26
observed
0.25
为ä»Ģä¹Ī
0.24
Why
0.23
Why
0.21
success
0.20
WHY
0.20
поÑĩемÑĥ
0.20
obs
0.19
Activations Density 0.198%