INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     arch
    -0.78
     grammar
    -0.65
     appro
    -0.65
     seasoned
    -0.64
     finished
    -0.63
     repro
    -0.63
     grades
    -0.62
     overlook
    -0.62
     filib
    -0.62
     lamp
    -0.61
    POSITIVE LOGITS
    We
    1.21
    Our
    1.15
    They
    1.06
    There
    1.05
    It
    1.05
    I
    1.03
    Everything
    1.02
    Today
    1.02
    What
    1.01
    Operation
    0.99
    Act Density 0.106%

    No Known Activations