INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ong
    1.05
    ש
    0.96
    ال
    0.95
     hugs
    0.94
     tattooed
    0.93
     kneeling
    0.93
    ли
    0.92
    غ
    0.92
    ()
    0.92
     illustrious
    0.92
    POSITIVE LOGITS
    ども
    0.96
    优质
    0.91
    endo
    0.87
    óln
    0.86
     misalnya
    0.86
    िक
    0.85
     adecuado
    0.85
    くに
    0.85
     جيد
    0.84
     atributo
    0.83
    Act Density 0.000%

    No Known Activations