INDEX
    Explanations

    affirmations and confirmations, particularly in dialogue

    New Auto-Interp
    Negative Logits
     лÑĮ
    -0.16
     tablesp
    -0.15
    hazi
    -0.15
    icher
    -0.15
     mis
    -0.15
    733
    -0.15
    istra
    -0.14
     Podesta
    -0.14
    ibar
    -0.14
     IMAGE
    -0.14
    POSITIVE LOGITS
     ZZ
    0.15
    adoo
    0.15
    rof
    0.15
    PWD
    0.14
    anki
    0.14
    ÑħÑĸд
    0.13
    badge
    0.13
    coil
    0.13
    ilden
    0.13
    оÑĢÑĭ
    0.13
    Act Density 0.001%

    No Known Activations