INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     tell
    -0.07
     steal
    -0.07
     realize
    -0.07
     lider
    -0.07
    /************************************************************************
    -0.07
     persuaded
    -0.06
     telling
    -0.06
    -Free
    -0.06
     Again
    -0.06
     brides
    -0.06
    POSITIVE LOGITS
     capacity
    0.09
     Capacity
    0.07
     capacities
    0.07
     respir
    0.07
     capability
    0.07
     propensity
    0.06
     Aux
    0.06
     tracing
    0.06
     ceased
    0.06
    596
    0.06
    Act Density 0.005%

    No Known Activations