INDEX
Explanations
phrases that indicate uncertainty or questioning regarding the effectiveness of a treatment
New Auto-Interp
Negative Logits
plib
-0.18
ÑĥлÑİ
-0.17
æĭĶ
-0.16
esi
-0.16
kyt
-0.14
esen
-0.14
adows
-0.14
γκο
-0.14
nell
-0.14
raph
-0.13
POSITIVE LOGITS
chances
0.20
because
0.20
Because
0.16
mean
0.16
means
0.15
porque
0.15
enge
0.15
because
0.15
yal
0.15
mean
0.15
Activations Density 0.083%