INDEX
    Explanations

    control, source

    New Auto-Interp
    Negative Logits
    /x
    -0.07
     Jud
    -0.06
     بما
    -0.06
     офици
    -0.06
     nằm
    -0.06
     meş
    -0.06
    -0.06
    afone
    -0.06
     آش
    -0.06
    ไป
    -0.06
    POSITIVE LOGITS
    Kal
    0.07
     forgetting
    0.07
    Heart
    0.06
    World
    0.06
    orld
    0.06
     Thompson
    0.06
     Companies
    0.06
     Roll
    0.06
    (Initialized
    0.06
    ��
    0.06
    Act Density 0.010%

    No Known Activations