INDEX
    Explanations

    phrases related to things staying the same or not being affected

    terms related to stability or lack of change

    New Auto-Interp
    Negative Logits
    ç«
    -0.68
    alez
    -0.67
     Typhoon
    -0.67
    aph
    -0.65
    RH
    -0.62
     McKenna
    -0.60
    =-=-=-=-=-=-=-=-
    -0.60
    ingo
    -0.60
    ¯¯
    -0.60
    eur
    -0.59
    POSITIVE LOGITS
     unchanged
    1.24
     untouched
    0.85
     unaffected
    0.83
    iated
    0.80
    ishment
    0.74
    theless
    0.73
    aneously
    0.71
    ãĤ´
    0.70
    iating
    0.68
     interpol
    0.67
    Act Density 0.006%

    No Known Activations