INDEX
Explanations
phrases indicating denial or contradiction in narratives
belief and what people know
New Auto-Interp
Negative Logits
ConstraintMaker
-0.60
surla
-0.50
ModelExpression
-0.48
SOUNDBITE
-0.47
témoins
-0.47
GenerationType
-0.47
AnchorStyles
-0.47
ContentAsync
-0.46
témoignages
-0.45
หน้านี้
-0.45
POSITIVE LOGITS
otomatig
0.44
endphp
0.42
Хьажоргаш
0.38
цездатний
0.35
об
0.35
ksi
0.34
Tac
0.33
بوس
0.33
Tac
0.32
KSI
0.32
Activations Density 0.087%