INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    基づ
    0.51
     এরা
    0.49
    пу
    0.47
     véritables
    0.47
    τους
    0.47
    ور
    0.47
    データを
    0.46
    га
    0.46
    fords
    0.46
     څرنګوالی
    0.45
    POSITIVE LOGITS
     of
    0.79
     in
    0.71
    A
    0.69
     a
    0.68
    \
    0.63
     it
    0.63
     Presidente
    0.62
     $
    0.59
    ina
    0.59
    (
    0.55
    Act Density 0.036%

    No Known Activations