INDEX
Explanations
instances of the word "publish" and its variations
New Auto-Interp
Negative Logits
instance
-0.16
ed
-0.16
ạp
-0.15
kn
-0.15
etical
-0.15
oh
-0.15
ën
-0.15
ook
-0.14
ass
-0.14
kidd
-0.14
POSITIVE LOGITS
entar
0.15
æ¬
0.15
hof
0.15
jabi
0.14
krét
0.14
ariat
0.14
á»IJ
0.14
ÙĪÙĦÙĬÙĪ
0.14
ermo
0.14
holm
0.13
Activations Density 0.033%