INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Linear
    -0.06
     Bobby
    -0.06
     xúc
    -0.06
    ritic
    -0.06
     Small
    -0.06
    obile
    -0.06
     기반
    -0.06
     Alvarez
    -0.06
     suction
    -0.06
    vertices
    -0.06
    POSITIVE LOGITS
    9
    0.08
     тобі
    0.07
    uster
    0.07
     Initialized
    0.07
     ninth
    0.07
    ...');↵
    0.07
    SB
    0.06
    istingu
    0.06
     recruitment
    0.06
    aes
    0.06
    Act Density 0.001%

    No Known Activations