INDEX
    Explanations

    a/an positive adjectives

    New Auto-Interp
    Negative Logits
     surprise
    -0.09
    adel
    -0.09
    HEME
    -0.08
    oment
    -0.08
    engin
    -0.08
     usur
    -0.08
    elez
    -0.08
    æľĭ
    -0.08
    ruc
    -0.08
    Ñģок
    -0.08
    POSITIVE LOGITS
     great
    0.27
     excellent
    0.24
     good
    0.23
     nice
    0.19
     effective
    0.18
     perfect
    0.18
     popular
    0.17
     ideal
    0.16
    great
    0.16
     fun
    0.15
    Act Density 0.065%

    No Known Activations