INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     док
    -0.07
     duyg
    -0.06
     Work
    -0.06
    ิจกรรม
    -0.06
    Williams
    -0.06
     volupt
    -0.06
    場合
    -0.06
    taient
    -0.06
    scriber
    -0.06
     هدف
    -0.06
    POSITIVE LOGITS
    _streams
    0.07
    known
    0.06
     timedelta
    0.06
     recreate
    0.06
    .footer
    0.06
    .Input
    0.06
    youtube
    0.06
    EXT
    0.06
    んだ
    0.06
    まと
    0.06
    Act Density 0.001%

    No Known Activations