INDEX
Explanations
email subscription-related text and prompts
New Auto-Interp
Negative Logits
ested
-0.68
footed
-0.65
esan
-0.62
Dwar
-0.61
cule
-0.60
ried
-0.59
rans
-0.58
utenant
-0.57
Ames
-0.56
dain
-0.56
POSITIVE LOGITS
scribe
0.75
Interstitial
0.74
taboola
0.70
CHAT
0.70
unsub
0.70
subscribe
0.67
ulate
0.66
ences
0.65
uncond
0.65
iatus
0.65
Activations Density 3.700%