INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    apist
    -0.07
    Concept
    -0.07
    تماد
    -0.07
     ̄ ̄
    -0.07
    Labels
    -0.07
    Hold
    -0.07
     ̄ ̄ ̄
    -0.06
    -New
    -0.06
     наяв
    -0.06
     ویرایش
    -0.06
    POSITIVE LOGITS
    jectory
    0.07
     trainable
    0.07
    STYLE
    0.06
     chy
    0.06
    particularly
    0.06
    0.06
     Knowledge
    0.06
     responsibility
    0.06
    Rot
    0.06
     lastIndex
    0.06
    Act Density 0.002%

    No Known Activations