INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sog
    -0.07
     hypertension
    -0.06
    AMP
    -0.06
    Trace
    -0.06
    amped
    -0.06
     Только
    -0.06
    Um
    -0.06
     sale
    -0.06
    Ya
    -0.06
    _Comm
    -0.06
    POSITIVE LOGITS
    >n
    0.06
    Decoration
    0.06
    ?.
    0.06
    >↵↵↵↵
    0.06
    :value
    0.06
    /format
    0.06
    ...↵↵↵↵↵↵
    0.06
    voří
    0.06
     работ
    0.06
     내려
    0.06
    Act Density 0.026%

    No Known Activations