INDEX
    Explanations

    Subscripts and superscripts

    New Auto-Interp
    Negative Logits
    opa
    -0.08
    _positive
    -0.07
     fraught
    -0.07
    changer
    -0.06
    .theta
    -0.06
    +')
    -0.06
     вклад
    -0.06
    bad
    -0.06
    direction
    -0.06
     homogeneous
    -0.06
    POSITIVE LOGITS
     Jonathan
    0.07
    	prop
    0.07
     spending
    0.07
     NYPD
    0.06
    """↵↵
    0.06
    0.06
    ngthen
    0.06
     Luis
    0.06
    0.06
    sunuz
    0.06
    Act Density 0.017%

    No Known Activations