INDEX
    Explanations

    words that indicate frequency or repetition

    New Auto-Interp
    Negative Logits
     otherwise
    -0.15
    âĢİ
    -0.14
    oro
    -0.14
    ÑĢÑĥп
    -0.14
    å¦ĤæŃ¤
    -0.14
     OTHERWISE
    -0.14
    ãģķãĤī
    -0.14
    izens
    -0.13
    velt
    -0.13
    obot
    -0.13
    POSITIVE LOGITS
     mostly
    0.32
    mostly
    0.30
     during
    0.29
     when
    0.29
    sometimes
    0.28
     sometimes
    0.28
     Mostly
    0.27
     whenever
    0.27
    when
    0.25
    during
    0.25
    Act Density 0.033%

    No Known Activations