INDEX
    Explanations

    comparisons indicating superiority or excellence

    comparisons indicating superiority or preference

    New Auto-Interp
    Negative Logits
    uto
    -0.74
    urther
    -0.71
    ALE
    -0.68
    ERN
    -0.65
    eeper
    -0.65
    ensor
    -0.64
    iosyn
    -0.64
    ango
    -0.64
    uria
    -0.64
    imb
    -0.63
    POSITIVE LOGITS
     anybody
    0.83
     anything
    0.83
     anyone
    0.80
     ever
    0.80
     usual
    0.79
     useless
    0.76
     ours
    0.73
     average
    0.71
     placebo
    0.71
    average
    0.71
    Act Density 0.091%

    No Known Activations