INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
     ọwọ
    -0.08
    -0.08
    қай
    -0.07
    -0.07
    咨询
    -0.07
    监管
    -0.07
     spoke
    -0.07
    tun
    -0.07
     tutu
    -0.07
    POSITIVE LOGITS
    620
    0.09
     दर
    0.08
                                                     
    0.07
    855
    0.07
    phrase
    0.07
     દર
    0.07
     standardized
    0.07
    .CODE
    0.07
    ชีวิต
    0.07
     rise
    0.07
    Act Density 0.001%

    No Known Activations