INDEX
    Explanations

    phrases that indicate agreement or affirmation

    New Auto-Interp
    Negative Logits
    uke
    -0.15
    orama
    -0.15
    олов
    -0.14
    ickey
    -0.14
    oomla
    -0.14
     Fill
    -0.14
    culus
    -0.13
    @qq
    -0.13
     Blanco
    -0.13
     Blind
    -0.13
    POSITIVE LOGITS
     LENG
    0.17
    engu
    0.15
    epad
    0.15
     automát
    0.15
    raci
    0.14
    808
    0.14
    Ñĥди
    0.14
    uden
    0.14
    xious
    0.14
    aru
    0.13
    Act Density 0.057%

    No Known Activations