INDEX
    Explanations

    phrases related to comparison, evaluation, and critique

    negative and positive descriptors related to events or situations

    New Auto-Interp
    Negative Logits
    ADRA
    -0.56
     lett
    -0.56
     conflic
    -0.56
    roit
    -0.55
    blast
    -0.54
     unequ
    -0.51
     warr
    -0.51
    ascript
    -0.50
     Cannot
    -0.49
    liv
    -0.49
    POSITIVE LOGITS
     is
    1.33
     are
    1.09
     was
    1.01
     involves
    0.91
     relates
    0.89
     revolves
    0.84
    is
    0.84
     lies
    0.81
     consists
    0.77
     include
    0.76
    Act Density 0.674%

    No Known Activations