INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    	mock
    -0.07
    Edward
    -0.07
     movie
    -0.07
     puppet
    -0.07
     Yin
    -0.07
     teaches
    -0.07
     is
    -0.06
    obe
    -0.06
     jams
    -0.06
    	sp
    -0.06
    POSITIVE LOGITS
    NAMESPACE
    0.07
    atı
    0.06
    0.06
     //{
    ↵
    0.06
    numerusform
    0.06
    istency
    0.06
     quý
    0.06
    Pres
    0.06
    Congratulations
    0.06
     infringement
    0.06
    Act Density 0.002%

    No Known Activations