INDEX
    Explanations

    quotations within the text

    New Auto-Interp
    Negative Logits
     favor
    -0.82
     parting
    -0.81
     slam
    -0.80
     pudding
    -0.79
     adjud
    -0.78
     periodic
    -0.74
     brid
    -0.74
     classified
    -0.74
     prec
    -0.74
     powerhouse
    -0.74
    POSITIVE LOGITS
    We
    1.59
    It
    1.59
    They
    1.56
    Because
    1.53
    I
    1.51
    There
    1.50
    Obviously
    1.47
    Nobody
    1.45
    Our
    1.44
    You
    1.44
    Act Density 0.484%

    No Known Activations