INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     крас
    0.44
    0.41
     Bhojpuri
    0.39
    0.39
    areti
    0.39
    𝓲
    0.39
     đỏ
    0.39
    0.39
    𝒊
    0.39
    anju
    0.38
    POSITIVE LOGITS
     Black
    1.28
     black
    1.26
    Black
    1.19
    1.16
    black
    1.14
    1.14
     BLACK
    1.13
     কালো
    1.11
     블랙
    1.09
    1.09
    Act Density 0.019%

    No Known Activations