INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     наб
    -0.08
    _ct
    -0.07
    _where
    -0.07
    Energy
    -0.07
     TRE
    -0.07
     Constit
    -0.07
    Grouping
    -0.07
    .energy
    -0.07
    _joint
    -0.07
    duğu
    -0.07
    POSITIVE LOGITS
     Ani
    0.08
     delights
    0.08
     proib
    0.08
     tapestry
    0.08
     tomu
    0.08
    0.07
    0.07
     EG
    0.07
     delight
    0.07
     perf
    0.07
    Act Density 0.012%

    No Known Activations