INDEX
    Explanations

    quotes within double quotation marks

    New Auto-Interp
    Negative Logits
     adjud
    -0.80
     favor
    -0.78
     arch
    -0.77
     prec
    -0.74
     spr
    -0.73
     grid
    -0.72
     scheduled
    -0.72
     pir
    -0.72
     prelim
    -0.70
     ranking
    -0.70
    POSITIVE LOGITS
    We
    1.74
    It
    1.65
    They
    1.62
    There
    1.62
    Our
    1.61
    I
    1.58
    Because
    1.55
    Everybody
    1.53
    Nobody
    1.53
    You
    1.52
    Act Density 1.682%

    No Known Activations