INDEX
    Explanations

    instances of the word "publish" and its variations

    New Auto-Interp
    Negative Logits
     instance
    -0.16
    ed
    -0.16
    ạp
    -0.15
    kn
    -0.15
    etical
    -0.15
    oh
    -0.15
    ën
    -0.15
    ook
    -0.14
    ass
    -0.14
     kidd
    -0.14
    POSITIVE LOGITS
    entar
    0.15
    æ¬
    0.15
    hof
    0.15
    jabi
    0.14
    krét
    0.14
    ariat
    0.14
    á»IJ
    0.14
    ÙĪÙĦÙĬÙĪ
    0.14
    ermo
    0.14
    holm
    0.13
    Act Density 0.033%

    No Known Activations