INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     boat
    -0.07
    oris
    -0.06
     Worse
    -0.06
    -vis
    -0.06
     Fleming
    -0.06
    вою
    -0.06
    imb
    -0.06
     bourgeois
    -0.06
     Вики
    -0.06
    arking
    -0.06
    POSITIVE LOGITS
    	check
    0.07
    .hidden
    0.07
    .social
    0.06
     pek
    0.06
     giáo
    0.06
    istent
    0.06
     NumberOf
    0.06
    anela
    0.06
     GEO
    0.06
     hộ
    0.06
    Act Density 0.007%

    No Known Activations