INDEX
Explanations
statements of opinion or declaration
New Auto-Interp
Negative Logits
Staub
-0.82
']")
-0.77
`]
-0.74
</table>
-0.69
FLO
-0.67
.-.
-0.67
'>";
-0.66
Thornton
-0.66
Gera
-0.66
POL
-0.65
POSITIVE LOGITS
SAY
1.64
SAY
1.61
Says
1.57
says
1.55
Say
1.52
say
1.52
Saying
1.51
Say
1.49
say
1.47
Says
1.46
Activations Density 0.099%