INDEX
    Explanations

    the followed by specific nouns

    the introduction of specific concepts

    New Auto-Interp
    Negative Logits
    0.66
    6
    0.63
    5
    0.61
    }=
    0.59
    }.
    0.56
    4
    0.56
    -
    0.55
     erhältlich
    0.54
    8
    0.53
     geheel
    0.53
    POSITIVE LOGITS
     to
    1.17
     that
    0.82
     که
    0.69
     at
    0.68
    0.64
     it
    0.63
     be
    0.60
     of
    0.58
    د
    0.56
     by
    0.52
    Act Density 0.592%

    No Known Activations