INDEX
    Explanations

    words indicating contrast or comparison

    New Auto-Interp
    Negative Logits
     Pyram
    -0.65
    PMailer
    -0.64
     targ
    -0.62
     camb
    -0.62
     Quan
    -0.61
     pron
    -0.61
     fout
    -0.61
     Varan
    -0.61
     Targ
    -0.60
     CPE
    -0.60
    POSITIVE LOGITS
     abstrait
    0.65
     abstrato
    0.59
     Italij
    0.58
     zijne
    0.56
     aislados
    0.52
     démocr
    0.52
     étoient
    0.52
     avoient
    0.52
     Wikiseite
    0.51
     infierno
    0.51
    Act Density 0.365%

    No Known Activations