INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .work
    -0.07
     fearful
    -0.06
    .mov
    -0.06
     خواه
    -0.06
    Waiting
    -0.06
     нап
    -0.06
    -0.06
    .dao
    -0.06
    _round
    -0.06
    outputs
    -0.06
    POSITIVE LOGITS
     pristine
    0.22
     impeccable
    0.21
     immac
    0.19
     impecc
    0.15
    ulate
    0.11
     unmistak
    0.11
    istine
    0.09
    ibbon
    0.07
     aficion
    0.07
    0.07
    Act Density 0.002%

    No Known Activations