INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ,
    0.64
     cierto
    0.59
     gitar
    0.57
    יים
    0.56
    znam
    0.56
    就是
    0.55
    issimo
    0.54
     brillante
    0.54
    \
    0.54
     appara
    0.53
    POSITIVE LOGITS
    ри
    0.79
    ور
    0.78
    ون
    0.75
    ра
    0.71
    вра
    0.70
    1
    0.70
    გორ
    0.66
    ج
    0.66
    ری
    0.65
    ни
    0.64
    Act Density 0.004%

    No Known Activations