INDEX
    Explanations

    model responses about capabilities

    New Auto-Interp
    Negative Logits
    Here
    0.60
     هنا
    0.58
     aquí
    0.57
     here
    0.57
     Here
    0.57
    HERE
    0.53
     Aquí
    0.52
     HERE
    0.51
     aqui
    0.50
     tutaj
    0.50
    POSITIVE LOGITS
    hexyl
    0.45
     Talk
    0.40
    concret
    0.40
     programs
    0.39
    adet
    0.38
     функциони
    0.37
     Jared
    0.37
     reviewer
    0.37
    industrial
    0.36
     Junk
    0.36
    Act Density 0.112%

    No Known Activations