INDEX
Explanations
content urging readers to check something out or follow a call to action
phrases encouraging the reader to check out more content or articles
New Auto-Interp
Negative Logits
suscept
-0.71
onte
-0.67
lishes
-0.64
escent
-0.62
essen
-0.61
OF
-0.61
hene
-0.60
oint
-0.58
reluct
-0.57
iga
-0.57
POSITIVE LOGITS
mate
1.18
out
1.18
out
1.01
boxes
0.96
back
0.95
lists
0.94
mates
0.91
outs
0.86
points
0.84
OUT
0.84
Activations Density 0.023%