INDEX
    Explanations

    comparative phrases indicating contrast or opposition

    New Auto-Interp
    Negative Logits
    erness
    -0.15
    eur
    -0.15
    ÑĥмÑĥ
    -0.15
    eline
    -0.14
    XA
    -0.14
    ormal
    -0.14
    astle
    -0.13
    loud
    -0.13
    ilton
    -0.13
    irm
    -0.13
    POSITIVE LOGITS
    dap
    0.15
    aye
    0.14
    BY
    0.14
     Mystery
    0.13
    еж
    0.13
    иÑĤе
    0.13
    poke
    0.13
    ikh
    0.13
    pite
    0.13
    Cro
    0.13
    Act Density 0.557%

    No Known Activations