INDEX
    Explanations

    variable in equation

    New Auto-Interp
    Negative Logits
     weakening
    -0.10
     bruis
    -0.09
     inability
    -0.08
     strengthening
    -0.08
    asakan
    -0.08
     weakened
    -0.08
     weaker
    -0.08
     weaken
    -0.08
     אד
    -0.08
     itching
    -0.08
    POSITIVE LOGITS
    mirror
    0.10
     Mirror
    0.10
    .flip
    0.10
    _flip
    0.10
    Mirror
    0.09
     Flip
    0.09
     mirror
    0.09
    Flip
    0.09
     mirrored
    0.09
    flip
    0.09
    Act Density 0.021%

    No Known Activations