INDEX
    Explanations

    statements indicating contradiction or inconsistency in beliefs and actions

    New Auto-Interp
    Negative Logits
    sov
    -0.15
    apiro
    -0.15
    abr
    -0.14
    venir
    -0.14
     sob
    -0.14
    758
    -0.14
    aster
    -0.14
    suz
    -0.14
    603
    -0.13
    à¥Ģद
    -0.13
    POSITIVE LOGITS
    /Dk
    0.16
    ammen
    0.16
    \grid
    0.15
    psc
    0.14
    icorn
    0.14
    ird
    0.14
    weit
    0.14
     ÐĴики
    0.14
    Editable
    0.14
    -append
    0.14
    Act Density 0.007%

    No Known Activations