INDEX
    Explanations

    phrases indicating support or endorsement

    references to support or endorsement

    New Auto-Interp
    Negative Logits
    anny
    -0.80
    ogens
    -0.78
    tein
    -0.77
    istics
    -0.72
    odor
    -0.72
    achu
    -0.70
    IRO
    -0.70
    kay
    -0.69
    oret
    -0.69
    EMS
    -0.68
    POSITIVE LOGITS
     backed
    1.10
     backing
    0.96
     milit
    0.81
    drive
    0.79
    backed
    0.76
    swing
    0.74
     arming
    0.73
    steen
    0.72
     corrobor
    0.71
     endorsed
    0.69
    Act Density 0.007%

    No Known Activations