INDEX
Explanations
negative sentiment and implications in the text
New Auto-Interp
Negative Logits
↵
-0.45
…
-0.40
*/
-0.39
s
-0.39
↵↵
-0.39
…
-0.39
*
-0.39
...
-0.38
↵↵↵
-0.37
\
-0.36
POSITIVE LOGITS
queſta
1.03
<unused8>
1.02
[@BOS@]
1.01
<unused14>
1.01
<unused43>
1.01
<unused51>
1.01
<unused42>
1.01
<unused41>
1.01
<unused16>
1.01
<unused28>
1.01
Activations Density 0.024%