INDEX
    Explanations

    breakdown and explanation

    New Auto-Interp
    Negative Logits
    Hold
    0.45
    0.45
    Fail
    0.41
    0.40
    0.40
    0.40
    stav
    0.39
     المش
    0.39
    0.39
    0.39
    POSITIVE LOGITS
     jargon
    0.59
     d
    0.58
     kenn
    0.50
     md
    0.50
     algebra
    0.50
     col
    0.49
     lau
    0.48
     sax
    0.47
     tabular
    0.47
     biography
    0.47
    Act Density 0.001%

    No Known Activations