INDEX
    Explanations

    nouns associated with positive attributes or evaluations

    New Auto-Interp
    Negative Logits
    AGED
    -0.15
     Lyon
    -0.15
     langu
    -0.15
    qu
    -0.14
     Gle
    -0.14
    785
    -0.14
    inals
    -0.14
    /be
    -0.14
    utsch
    -0.14
    own
    -0.14
    POSITIVE LOGITS
    overall
    0.19
     overall
    0.17
    abela
    0.15
    ucch
    0.15
     Overall
    0.15
    anner
    0.15
    Overall
    0.15
    ãĥ¥ãĥ¼
    0.15
    RIORITY
    0.15
    illis
    0.15
    Act Density 0.149%

    No Known Activations