INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    لينكات
    -0.41
    roek
    -0.34
    คม
    -0.32
     wrong
    -0.31
    ][-
    -0.31
     gấp
    -0.30
     Quentin
    -0.29
    -0.29
     gồm
    -0.29
     security
    -0.29
    POSITIVE LOGITS
     dance
    2.36
     Dance
    2.20
    Dance
    2.19
     DANCE
    2.16
     danced
    2.08
    dance
    2.06
     dancing
    1.99
     dances
    1.95
    DANCE
    1.91
    dancing
    1.84
    Act Density 0.081%

    No Known Activations