INDEX
    Explanations

    immune, exempt

    New Auto-Interp
    Negative Logits
     prost
    -0.08
     preko
    -0.08
     Prost
    -0.08
    akov
    -0.07
    ніверс
    -0.07
    eep
    -0.07
     escon
    -0.07
    Oak
    -0.07
    enis
    -0.07
     ));↵↵
    -0.07
    POSITIVE LOGITS
     unaffected
    0.13
     exempt
    0.11
     excluded
    0.10
     제외
    0.10
     untouched
    0.10
     영향을
    0.09
     insulated
    0.09
     spared
    0.09
    受到
    0.09
    Excluded
    0.09
    Act Density 0.035%

    No Known Activations