INDEX
    Explanations

    indicative phrases and statements that express decision-making or personal opinions

    New Auto-Interp
    Negative Logits
    eton
    -0.17
    aos
    -0.16
    auge
    -0.15
    Cha
    -0.15
    ζ
    -0.15
    γκ
    -0.14
    ¶
    -0.14
    ascimento
    -0.14
    plode
    -0.14
    rzy
    -0.14
    POSITIVE LOGITS
     instead
    0.25
     Instead
    0.20
    Instead
    0.19
    instead
    0.18
    anter
    0.15
     вмеÑģÑĤ
    0.14
    asco
    0.14
    çļĦæĺ¯
    0.14
     only
    0.14
    Hack
    0.14
    Act Density 0.273%

    No Known Activations