INDEX
Explanations
negations and cautions regarding actions or recommendations
New Auto-Interp
Negative Logits
ux
-0.15
AMPL
-0.15
ssel
-0.14
.pivot
-0.14
izon
-0.14
tring
-0.14
rung
-0.14
Gos
-0.14
somehow
-0.14
ubu
-0.14
POSITIVE LOGITS
exceed
0.30
EVER
0.28
ever
0.26
touch
0.23
hesitate
0.22
exceeds
0.22
worry
0.22
пÑĢевÑĭÑĪ
0.22
exceeding
0.21
-ever
0.21
Activations Density 0.121%