INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    akir
    -0.07
    Vak
    -0.07
     inertia
    -0.07
     الو
    -0.07
     какая
    -0.07
    Reliable
    -0.07
     sme
    -0.07
    .annotations
    -0.07
    _annotations
    -0.07
     continuity
    -0.07
    POSITIVE LOGITS
    clamation
    0.08
    」、「
    0.08
     Kumar
    0.08
     Royaume
    0.08
     Ts
    0.07
     Lets
    0.07
    imeter
    0.07
    0.07
     Sons
    0.07
     ټول
    0.07
    Act Density 0.004%

    No Known Activations