INDEX
Explanations
phrases indicating certainty or lack of doubt
New Auto-Interp
Negative Logits
оз
-0.14
basePath
-0.14
umer
-0.14
bourg
-0.14
Drv
-0.14
âĬ
-0.14
sublic
-0.13
té
-0.13
ooth
-0.13
429
-0.13
POSITIVE LOGITS
doubt
0.40
deny
0.33
Doub
0.32
denying
0.30
dispute
0.30
doubts
0.28
deny
0.27
disput
0.26
denies
0.25
denial
0.25
Activations Density 0.046%