INDEX
    Explanations

    symbols followed by capitalized words

    New Auto-Interp
    Negative Logits
     (
    0.66
    ،
    0.52
     is
    0.52
     a
    0.46
    0.42
     ([[
    0.42
    0.42
     (${
    0.42
     this
    0.41
     of
    0.40
    POSITIVE LOGITS
    in
    0.59
    0.57
    ون
    0.56
    the
    0.55
    ad
    0.54
    ap
    0.53
    f
    0.52
    an
    0.50
    export
    0.50
    w
    0.49
    Act Density 0.387%

    No Known Activations