INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     arch
    -0.82
     graz
    -0.81
     favour
    -0.77
     vault
    -0.77
     spr
    -0.76
     quir
    -0.76
     hauled
    -0.76
     listed
    -0.76
     vegetarian
    -0.75
     favor
    -0.74
    POSITIVE LOGITS
    We
    1.75
    Our
    1.73
    Today
    1.57
    Obviously
    1.57
    Everybody
    1.57
    There
    1.56
    Nobody
    1.56
    It
    1.56
    This
    1.55
    They
    1.54
    Act Density 0.121%

    No Known Activations