INDEX
    Explanations

    phrases indicating capability or potential actions

    New Auto-Interp
    Negative Logits
    aldo
    -0.16
    AVA
    -0.14
    elho
    -0.14
    adele
    -0.14
    /comment
    -0.14
     Tele
    -0.14
     ÑĢев
    -0.13
    noÅĽci
    -0.13
    arching
    -0.13
    batim
    -0.13
    POSITIVE LOGITS
    -Ass
    0.16
    ima
    0.15
     Mattis
    0.14
    panse
    0.14
    aise
    0.14
    ojis
    0.14
     saja
    0.14
    URNS
    0.14
    azen
    0.14
    nil
    0.13
    Act Density 0.152%

    No Known Activations