INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     götür
    -0.07
     Beled
    -0.07
    ुआत
    -0.07
    _eta
    -0.07
    感到
    -0.07
     conductor
    -0.07
     perimeter
    -0.07
    .orientation
    -0.07
     nabíd
    -0.07
     kiểm
    -0.07
    POSITIVE LOGITS
     slave
    0.17
     Slave
    0.14
     slaves
    0.11
    slave
    0.11
    Slave
    0.11
    _slave
    0.08
     Dave
    0.07
     ensl
    0.07
    /save
    0.07
    _SLAVE
    0.07
    Act Density 0.003%

    No Known Activations