INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    constraint
    -0.07
    pickup
    -0.07
     halo
    -0.07
     Integral
    -0.07
     Triple
    -0.07
     Increment
    -0.07
    uncate
    -0.07
    study
    -0.06
     karma
    -0.06
     Palmer
    -0.06
    POSITIVE LOGITS
    56
    0.10
    55
    0.08
    78
    0.08
    75
    0.08
    83
    0.08
     >",
    0.07
    58
    0.07
    69
    0.07
    81
    0.07
    60
    0.07
    Act Density 0.226%

    No Known Activations