INDEX
Explanations
questions in the text
questions that prompt engagement or encourage action
New Auto-Interp
Negative Logits
aling
-0.67
ality
-0.66
foreground
-0.66
jam
-0.66
forth
-0.63
ema
-0.63
iculture
-0.62
athe
-0.62
riel
-0.60
urnal
-0.60
POSITIVE LOGITS
Check
1.06
Try
1.00
Become
0.98
Consider
0.94
Nope
0.94
Cancel
0.94
Subscribe
0.92
Congratulations
0.90
Find
0.90
Want
0.89
Activations Density 0.071%