INDEX
Explanations
phrases requesting user action or providing instructions
requests for user input or confirmation
New Auto-Interp
Negative Logits
itive
-0.61
driving
-0.55
laus
-0.54
words
-0.52
academ
-0.52
mole
-0.51
amus
-0.51
ulz
-0.50
tro
-0.49
Barker
-0.49
POSITIVE LOGITS
Cancel
0.85
verify
0.70
refresh
0.69
login
0.64
Subscribe
0.63
try
0.63
subscribe
0.62
Refresh
0.62
reuse
0.62
inbox
0.62
Activations Density 0.011%