INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    $img
    -0.08
     yasal
    -0.08
    ічної
    -0.07
    =os
    -0.07
    ジア
    -0.07
     std
    -0.07
     som
    -0.07
    -ups
    -0.07
    Fs
    -0.07
    -0.06
    POSITIVE LOGITS
     before
    0.18
     Before
    0.14
    before
    0.13
    Before
    0.12
     BEFORE
    0.11
    _before
    0.10
    -before
    0.09
    (before
    0.08
     Fear
    0.08
     قبل
    0.07
    Act Density 0.049%

    No Known Activations