INDEX
    Explanations

    expressions of various emotions or attitudes towards a specific topic or situation

    expressions of concern or disapproval regarding various issues

    New Auto-Interp
    Negative Logits
     Goodwin
    -0.69
     Blooming
    -0.63
     constructed
    -0.62
     Narr
    -0.60
     Glover
    -0.59
     Grain
    -0.59
     sear
    -0.58
     waterfall
    -0.57
     shoe
    -0.56
    agog
    -0.56
    POSITIVE LOGITS
    rompt
    0.81
    bia
    0.76
    idad
    0.76
    llah
    0.74
    ociation
    0.73
    ilty
    0.71
    isexual
    0.71
    Letter
    0.70
     displeasure
    0.70
    oche
    0.69
    Act Density 0.163%

    No Known Activations