INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ن
    1.02
    n
    0.95
     vehemently
    0.94
    ла
    0.90
    و
    0.84
    ول
    0.79
    0.79
     bebidas
    0.78
    ل
    0.72
    ו
    0.72
    POSITIVE LOGITS
    minValue
    0.82
    ocy
    0.79
    ž
    0.78
    ution
    0.72
    $
    0.72
    :
    0.70
    <unused334>
    0.69
    }$
    0.68
    //
    0.68
    ogen
    0.68
    Act Density 0.001%

    No Known Activations