INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Restrict
    -0.43
    íncia
    -0.36
     potr
    -0.36
    Deel
    -0.35
    rsiniz
    -0.35
    RTSC
    -0.34
    ドウ
    -0.34
    IEVE
    -0.34
     görüntü
    -0.34
    leos
    -0.34
    POSITIVE LOGITS
     hop
    0.88
     Hip
    0.81
     Hop
    0.79
     hip
    0.77
    Hop
    0.74
    Hip
    0.74
     HOP
    0.68
    hop
    0.68
     HIP
    0.65
     hops
    0.56
    Act Density 0.001%

    No Known Activations