INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Administrativna
    -1.23
    <pad>
    -0.87
    <unused42>
    -0.87
    [@BOS@]
    -0.86
    <unused28>
    -0.86
    <unused14>
    -0.86
    <unused17>
    -0.86
    <unused16>
    -0.86
    <unused3>
    -0.86
    <unused8>
    -0.86
    POSITIVE LOGITS
    id
    0.31
    add
    0.28
    0.26
     com
    0.26
    0.25
     '
    0.25
    ids
    0.24
    i
    0.24
    group
    0.24
    C
    0.24
    Act Density 0.008%

    No Known Activations