INDEX
    Explanations

    Start of text

    New Auto-Interp
    Negative Logits
    
    -0.09
    
    -0.09
    
    -0.08
    smål
    -0.08
     guh
    -0.08
    ений
    -0.08
    _species
    -0.08
     ג
    -0.08
    imeve
    -0.08
    վող
    -0.07
    POSITIVE LOGITS
     Kate
    0.08
     baki
    0.08
     Är
    0.08
     SDA
    0.08
    roof
    0.08
     arriv
    0.07
     Ramadan
    0.07
     communic
    0.07
    Kate
    0.07
     marina
    0.07
    Act Density 0.021%

    No Known Activations