INDEX
Explanations
expressions and phrases indicating skepticism or questioning of narratives
New Auto-Interp
Negative Logits
apl
-0.16
leh
-0.15
ije
-0.14
inski
-0.14
تÙģ
-0.14
uby
-0.14
ãĥĥ
-0.13
(&
-0.13
isp
-0.13
comed
-0.13
POSITIVE LOGITS
ìŀIJ
0.16
-widgets
0.14
strains
0.14
catid
0.14
verg
0.13
validationResult
0.13
ackages
0.13
loy
0.13
conse
0.13
spoilers
0.13
Activations Density 0.000%