INDEX
Explanations
references to signing up for newsletters or services
instances of the word "signing" and its variations
New Auto-Interp
Negative Logits
ĸļ
-0.80
agy
-0.79
Islands
-0.74
ILCS
-0.71
irth
-0.68
ooked
-0.68
¬¼
-0.66
Pg
-0.65
Surviv
-0.64
»Ĵ
-0.63
POSITIVE LOGITS
*/(
1.02
atories
0.96
atory
0.91
ific
0.81
ifying
0.79
chant
0.78
uary
0.76
ifiers
0.75
eering
0.75
ificantly
0.75
Activations Density 0.023%