INDEX
    Explanations

    names with non-Western origins

    New Auto-Interp
    Negative Logits
     anún
    -0.69
    berdayakan
    -0.68
    mejores
    -0.66
    pantalón
    -0.65
    gafas
    -0.62
     pouvoit
    -0.62
     llorando
    -0.61
     ainfi
    -0.60
     prohibido
    -0.60
     dangereux
    -0.59
    POSITIVE LOGITS
    Leary
    0.57
    AxisAlignment
    0.55
    expandindo
    0.55
    0.54
    Donnell
    0.52
    asanjo
    0.50
     Lynx
    0.48
    aarrggbb
    0.46
     TSM
    0.45
    umn
    0.45
    Act Density 0.084%

    No Known Activations