INDEX
    Explanations

    expressions indicating opposition or contrasting viewpoints

    New Auto-Interp
    Negative Logits
    eview
    -0.15
     Haz
    -0.15
    innacle
    -0.15
     overall
    -0.14
    anton
    -0.14
    ucha
    -0.14
    ourg
    -0.13
    upert
    -0.13
     Perspectives
    -0.13
    ertz
    -0.13
    POSITIVE LOGITS
    ines
    0.15
    EMU
    0.14
     Magnet
    0.14
    женеÑĢ
    0.14
    .rules
    0.14
    ucker
    0.14
    uhan
    0.13
    kaar
    0.13
    sen
    0.13
    uple
    0.13
    Act Density 0.021%

    No Known Activations