INDEX
    Explanations

    mentions of types, categories, and classifications within various contexts

    New Auto-Interp
    Negative Logits
    SequentialGroup
    -0.61
     poffible
    -0.59
    <unused43>
    -0.57
    <unused41>
    -0.57
    <unused3>
    -0.57
    <unused42>
    -0.57
    <unused51>
    -0.57
    <unused8>
    -0.56
    [@BOS@]
    -0.56
    <pad>
    -0.56
    POSITIVE LOGITS
     you
    0.49
     he
    0.44
     they
    0.43
     it
    0.41
    ass
    0.40
    pa
    0.38
     we
    0.38
    you
    0.38
     I
    0.37
     للاسماء
    0.37
    Act Density 0.018%

    No Known Activations