INDEX
    Explanations

    code and math

    New Auto-Interp
    Negative Logits
    .au
    -0.06
    .Do
    -0.06
     CFR
    -0.06
     Sensor
    -0.06
    .do
    -0.06
     rough
    -0.06
    (identity
    -0.06
    _health
    -0.06
    Installation
    -0.06
    Nh
    -0.05
    POSITIVE LOGITS
    ственно
    0.07
     releg
    0.07
     charms
    0.07
     */↵↵↵↵
    0.07
    0.07
    kul
    0.06
    ifs
    0.06
    /card
    0.06
     teşekkür
    0.06
    0.06
    Act Density 0.003%

    No Known Activations