INDEX
Explanations
sentences that express strong emotion or feelings
New Auto-Interp
Negative Logits
ensibly
-0.79
apers
-0.75
graded
-0.73
aper
-0.73
ciplinary
-0.72
¥ŀ
-0.72
undai
-0.71
azo
-0.71
userc
-0.71
asers
-0.70
POSITIVE LOGITS
âĢķ
0.98
Asked
0.93
[/
0.91
<|endoftext|>
0.86
Adds
0.84
Cue
0.82
["
0.79
Then
0.78
Hearing
0.77
Lastly
0.77
Activations Density 0.055%