INDEX
    Explanations

    African American Vernacular English

    New Auto-Interp
    Negative Logits
    0.61
    ում
    0.57
    Об
    0.55
    Су
    0.55
    João
    0.55
    Australia
    0.53
    Кар
    0.52
    doesn
    0.52
    Kết
    0.52
    Дру
    0.52
    POSITIVE LOGITS
    in
    0.65
    z
    0.61
     homem
    0.52
    ad
    0.52
    em
    0.52
     entrepreneur
    0.50
     b
    0.49
    c
    0.49
    f
    0.49
     mark
    0.48
    Act Density 0.002%

    No Known Activations