INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     provisions
    -0.09
    -0.08
    _br
    -0.08
     vivement
    -0.08
     öl
    -0.08
    Br
    -0.08
    -0.08
    -0.07
     eager
    -0.07
    brace
    -0.07
    POSITIVE LOGITS
     preserved
    0.14
     unchanged
    0.13
     invari
    0.12
     Preservation
    0.11
     invariant
    0.11
     preservation
    0.11
     preserving
    0.10
     behouden
    0.10
    Invariant
    0.10
     Maintaining
    0.10
    Act Density 0.017%

    No Known Activations