INDEX
    Explanations

    words related to culture and music

    New Auto-Interp
    Negative Logits
    ernals
    -0.18
    anio
    -0.16
     Sonata
    -0.15
    ourn
    -0.15
    hiro
    -0.15
     Smoke
    -0.15
    ائز
    -0.15
    arto
    -0.15
    zzo
    -0.15
    isman
    -0.15
    POSITIVE LOGITS
     siÄĻ
    0.52
     sich
    0.48
     zich
    0.32
     itself
    0.31
    ÑģÑı
    0.29
     themselves
    0.29
     sig
    0.28
     si
    0.27
    -se
    0.26
     ÑģебÑı
    0.25
    Act Density 0.016%

    No Known Activations