INDEX
    Explanations

    logical contrapositives

    New Auto-Interp
    Negative Logits
     pioneer
    -0.09
     erforder
    -0.08
     emuls
    -0.07
     endgült
    -0.07
    acht
    -0.07
     Valencia
    -0.07
     Genel
    -0.07
    rug
    -0.07
     Freelancer
    -0.07
     agu
    -0.07
    POSITIVE LOGITS
     CWE
    0.08
    0.08
     closures
    0.08
     reversed
    0.08
     closure
    0.08
     alex
    0.08
    近平
    0.08
     transpose
    0.08
     verbs
    0.08
     anton
    0.08
    Act Density 0.013%

    No Known Activations