INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <eos>
    -0.57
    featureID
    -0.53
     in
    -0.50
    dah
    -0.47
    KURZBESCHREIBUNG
    -0.46
     step
    -0.45
     piece
    -0.45
     mistake
    -0.45
    Curi
    -0.45
     question
    -0.44
    POSITIVE LOGITS
     חיצוניים
    0.85
    <bos>
    0.75
    \{\\
    0.70
     AttributeSet
    0.67
    الحياه
    0.64
    例文帳に追加
    0.61
     reads
    0.61
    ValueStyle
    0.60
     announces
    0.59
    rrggbb
    0.59
    Act Density 0.045%

    No Known Activations