INDEX
    Explanations

    phrases indicating research studies and their methodologies

    New Auto-Interp
    Negative Logits
    <bos>
    -1.50
    Cringe
    -0.72
     assiste
    -0.70
     confirme
    -0.65
    Fuckin
    -0.58
     feign
    -0.57
     croit
    -0.56
     constate
    -0.56
    /***
    
    -0.55
     captiv
    -0.54
    POSITIVE LOGITS
     maroc
    1.02
     unwarran
    1.01
     Keny
    0.99
     Hez
    0.94
     bahay
    0.93
     kani
    0.91
     saad
    0.90
     bagay
    0.90
     mikrofon
    0.89
     susun
    0.89
    Act Density 2.327%

    No Known Activations