INDEX
Explanations
phrases that express hesitation or caution in making claims
New Auto-Interp
Negative Logits
ken
-0.18
ynam
-0.15
ê²ł
-0.15
Blackburn
-0.15
ahl
-0.14
ê°Ŀ
-0.14
acci
-0.14
kola
-0.14
woff
-0.13
ศร
-0.13
POSITIVE LOGITS
oten
0.15
Cru
0.15
space
0.15
ilin
0.14
ìĦŃ
0.14
thrown
0.14
sd
0.14
CCA
0.14
allery
0.14
site
0.14
Activations Density 0.385%