INDEX
Explanations
phrases expressing doubt and uncertainty
New Auto-Interp
Negative Logits
utor
-0.17
rif
-0.17
edla
-0.16
INTR
-0.15
tir
-0.15
tuk
-0.15
hong
-0.15
darm
-0.14
_CLIP
-0.14
ivor
-0.14
POSITIVE LOGITS
somewhere
0.22
somehow
0.18
similarly
0.17
since
0.16
alguna
0.16
irgend
0.15
algún
0.15
somew
0.15
considering
0.14
ason
0.14
Activations Density 0.271%