INDEX
Explanations
phrases where someone is expressing an opinion or argument
statements or claims made by individuals
New Auto-Interp
Negative Logits
ptives
-0.72
adesh
-0.70
theless
-0.65
================================================================
-0.63
prus
-0.63
ãĥ¼ãĥĨ
-0.62
estern
-0.60
https
-0.60
uador
-0.60
FTWARE
-0.60
POSITIVE LOGITS
,,
0.76
*,
0.72
,
0.71
convinc
0.70
bluntly
0.66
goodbye
0.61
!,
0.60
omin
0.59
cyn
0.59
emphatically
0.58
Activations Density 0.187%