INDEX
    Explanations

    words related to judgment or evaluation

    negations and expressions of skepticism or doubt

    New Auto-Interp
    Negative Logits
    iets
    -0.94
    regation
    -0.89
    glers
    -0.87
    ortmund
    -0.82
    atari
    -0.80
    umps
    -0.80
    phia
    -0.79
    Ws
    -0.78
    olphins
    -0.77
    acas
    -0.76
    POSITIVE LOGITS
     downright
    1.24
     impractical
    1.20
     addictive
    1.20
     impossible
    1.13
     profitable
    1.13
     dangerous
    1.10
     inspiring
    1.10
     irresistible
    1.09
     scary
    1.08
     desirable
    1.07
    Act Density 0.334%

    No Known Activations