INDEX
    Explanations

    phrases related to contributions or contradictions

    New Auto-Interp
    Negative Logits
     Monk
    -0.67
     Reboot
    -0.66
     Palace
    -0.65
    ilee
    -0.65
     lyn
    -0.65
     bells
    -0.64
     vows
    -0.63
     MBA
    -0.63
     undergrad
    -0.62
     knee
    -0.62
    POSITIVE LOGITS
    cont
    3.98
    Cont
    2.63
     Cont
    1.79
    CONT
    1.66
     CONT
    1.40
    dist
    1.17
    contact
    1.17
     cont
    1.16
    comp
    1.16
    controller
    1.11
    Act Density 0.006%

    No Known Activations