INDEX
    Explanations

    geographical locations and nationalities

    New Auto-Interp
    Negative Logits
     
    0.75
    ח
    0.69
     is
    0.68
    ק
    0.67
    માં
    0.65
    정이
    0.60
     in
    0.56
    0.56
    ك
    0.56
     동일
    0.55
    POSITIVE LOGITS
    -
    0.85
    :
    0.75
    ٥
    0.71
    il
    0.70
     delà
    0.67
    á
    0.64
    el
    0.63
    0.63
    y
    0.62
    н
    0.61
    Act Density 0.145%

    No Known Activations