INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <pad>
    -0.67
    <unused23>
    -0.67
    <unused41>
    -0.67
    tvguidetime
    -0.67
    <unused28>
    -0.66
    <unused17>
    -0.66
    <unused8>
    -0.66
    <unused14>
    -0.66
    <unused3>
    -0.66
    [@BOS@]
    -0.66
    POSITIVE LOGITS
     Tourisme
    0.40
     veau
    0.38
     Mathem
    0.35
     pattes
    0.35
     morts
    0.33
    UNRELATED
    0.33
     Vienne
    0.32
     lapin
    0.31
     moulin
    0.31
     cheval
    0.30
    Act Density 0.009%

    No Known Activations