INDEX
    Explanations

    words related to expressing support or agreement

    New Auto-Interp
    Negative Logits
    anus
    -0.76
     Gap
    -0.71
    enegger
    -0.71
    anned
    -0.68
     typo
    -0.67
    kefeller
    -0.63
     Brist
    -0.60
    uilt
    -0.59
     Gorge
    -0.58
     Shant
    -0.57
    POSITIVE LOGITS
    itism
    0.91
    clusion
    0.74
     thereof
    0.72
    iveness
    0.71
    porting
    0.69
    anding
    0.68
    raints
    0.67
    cussion
    0.67
    clude
    0.66
    ative
    0.66
    Act Density 0.033%

    No Known Activations