INDEX
    Explanations

    exceeding a limit

    New Auto-Interp
    Negative Logits
     Montana
    -0.08
     family
    -0.08
     *↵↵
    -0.08
     Morgan
    -0.07
     Sw
    -0.07
    inematics
    -0.07
     Family
    -0.07
    cussion
    -0.07
     discussing
    -0.07
     familial
    -0.07
    POSITIVE LOGITS
     exceed
    0.10
    .additional
    0.09
     excess
    0.09
     exceeds
    0.09
     stretched
    0.09
    .tail
    0.09
     zusätzliche
    0.09
    _delta
    0.08
    0.08
    (offset
    0.08
    Act Density 0.015%

    No Known Activations