INDEX
Explanations
testimony and statements made by individuals in different situations
New Auto-Interp
Negative Logits
ptives
-0.83
adesh
-0.67
estern
-0.66
theless
-0.65
prus
-0.64
uador
-0.63
celona
-0.61
speech
-0.61
ãĥ¼ãĥĨ
-0.60
FTWARE
-0.60
POSITIVE LOGITS
,,
0.77
,
0.75
*,
0.73
convinc
0.65
bluntly
0.61
.,
0.61
!,
0.59
goodbye
0.58
Zup
0.55
,...
0.55
Activations Density 0.152%