INDEX
    Explanations

    specific names, including proper nouns and abbreviations

    New Auto-Interp
    Negative Logits
    <bos>
    -1.37
     intersper
    -1.10
     forbear
    -0.76
     impelled
    -0.76
     vainly
    -0.75
     overcrow
    -0.74
    /**
    -0.73
    equila
    -0.72
    
    
    -0.71
     disbur
    -0.70
    POSITIVE LOGITS
     utop
    0.81
     cioc
    0.74
     Tow
    0.64
     Toxicol
    0.63
    tke
    0.62
    ToTensor
    0.61
    TO
    0.60
     gmbh
    0.60
     africain
    0.60
     télévis
    0.59
    Act Density 0.277%

    No Known Activations