INDEX
    Explanations

    boundaries and exclusions

    New Auto-Interp
    Negative Logits
     You
    0.39
     ۔
    0.39
     oldu
    0.37
    <start_of_image>
    0.37
     คุณ
    0.37
     Você
    0.36
     !
    0.35
    ____________
    0.35
    َ
    0.35
         
    0.34
    POSITIVE LOGITS
     without
    0.59
     χωρίς
    0.56
    without
    0.55
    整体
    0.52
     WITHIN
    0.52
     WITHOUT
    0.51
     без
    0.49
     regardless
    0.48
    /"
    0.47
     irrespective
    0.47
    Act Density 0.149%

    No Known Activations