INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    家族
    -0.08
    Level
    -0.07
     управ
    -0.07
    .Vector
    -0.07
    іст
    -0.06
     ry
    -0.06
     Environmental
    -0.06
    _Rem
    -0.06
    -0.06
     moderators
    -0.06
    POSITIVE LOGITS
    <|start_header_id|>
    0.07
     tourists
    0.07
     promises
    0.06
    0.06
    casting
    0.06
     lends
    0.06
    leston
    0.06
     colormap
    0.06
    -comment
    0.06
    _emb
    0.06
    Act Density 0.000%

    No Known Activations