INDEX
    Explanations

    let me introduce an action or explanation

    New Auto-Interp
    Negative Logits
    ederal
    -0.90
     negozio
    -0.89
     Synd
    -0.88
     ragazzo
    -0.88
     selam
    -0.88
     konfig
    -0.88
     maksimum
    -0.87
    ystema
    -0.86
    érées
    -0.86
    ïc
    -0.85
    POSITIVE LOGITS
     be
    1.58
     know
    1.45
     tell
    1.32
     help
    1.09
     first
    1.08
     explain
    1.08
    ราบ
    1.05
     please
    1.01
    dieran
    0.96
     сейчас
    0.96
    Act Density 0.012%

    No Known Activations