INDEX
Explanations
words related to skepticism or doubt
New Auto-Interp
Negative Logits
rika
-0.16
aca
-0.15
ouse
-0.15
Fol
-0.15
yun
-0.15
tn
-0.15
alim
-0.14
698
-0.14
ya
-0.14
lix
-0.14
POSITIVE LOGITS
ptic
0.29
letal
0.28
Ske
0.21
chers
0.19
letic
0.18
skept
0.18
scept
0.17
ptom
0.17
pch
0.16
-UA
0.16
Activations Density 0.008%