INDEX
    Explanations

    time to take effect

    New Auto-Interp
    Negative Logits
    cos
    -0.08
    petto
    -0.07
     قول
    -0.07
    -0.07
     *>
    -0.07
    -0.07
    宝贵的
    -0.07
     reconoc
    -0.06
     AAC
    -0.06
    rightarrow
    -0.06
    POSITIVE LOGITS
     под
    0.07
    帮忙
    0.07
    _activation
    0.07
    سياس
    0.06
     العلاقة
    0.06
    IsActive
    0.06
    под
    0.06
    liament
    0.06
    .cp
    0.06
     openings
    0.06
    Act Density 0.181%

    No Known Activations