INDEX
    Explanations

    affirmative responses and positive confirmations

    New Auto-Interp
    Negative Logits
    hal
    -0.17
    ton
    -0.17
    ect
    -0.16
    loo
    -0.15
    weg
    -0.15
    jes
    -0.14
    lo
    -0.14
    uma
    -0.14
    cin
    -0.14
    pton
    -0.14
    POSITIVE LOGITS
    enia
    0.17
    Ñĥди
    0.17
    Ģìŀ¥
    0.17
    nick
    0.16
    udas
    0.16
    indy
    0.15
    gor
    0.15
    agher
    0.15
    Bias
    0.15
    /false
    0.15
    Act Density 0.043%

    No Known Activations