INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    living
    -0.09
     pae
    -0.08
    _sched
    -0.08
     sched
    -0.07
    ieter
    -0.07
    Wash
    -0.07
     Tooth
    -0.07
    sched
    -0.07
     asses
    -0.07
     τέ
    -0.07
    POSITIVE LOGITS
    0.08
    0.08
     mate
    0.08
     gyro
    0.08
    0.07
     calor
    0.07
    are
    0.07
     Pros
    0.07
    ге
    0.07
    0.07
    Act Density 0.003%

    No Known Activations