INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     arch
    -0.78
     repro
    -0.73
     adjud
    -0.73
     prelim
    -0.71
     separated
    -0.71
     scheduled
    -0.70
     favor
    -0.70
     listed
    -0.69
     powered
    -0.69
     departed
    -0.69
    POSITIVE LOGITS
    We
    1.57
    Our
    1.49
    They
    1.47
    It
    1.45
    There
    1.45
    Everybody
    1.43
    Nobody
    1.40
    I
    1.40
    Everything
    1.40
    What
    1.39
    Act Density 0.570%

    No Known Activations