INDEX
Explanations
the presence of the word "publish" or its variants
forms of the word "publish" and its derivatives
New Auto-Interp
Negative Logits
overhe
-0.63
crystal
-0.61
prev
-0.61
overall
-0.61
gearing
-0.60
crystals
-0.60
matrix
-0.59
proxy
-0.59
equival
-0.58
concent
-0.58
POSITIVE LOGITS
lish
4.86
lishing
2.12
lishes
1.82
lished
1.58
lisher
1.40
lez
1.20
rious
1.03
lyak
1.02
bish
0.99
leness
0.96
Activations Density 0.013%