INDEX
    Explanations

    proper nouns, specifically names of people

    New Auto-Interp
    Negative Logits
     ویکی‌پدی
    -0.48
    دانشنامهٔ
    -0.48
    restle
    -0.44
    ighorn
    -0.44
     Goy
    -0.42
    tfrac
    -0.42
     Grot
    -0.41
    FromArgb
    -0.41
     Hadd
    -0.40
    neth
    -0.40
    POSITIVE LOGITS
     himself
    0.64
    himself
    0.59
     himſelf
    0.59
    但他
    0.55
    anyahu
    0.53
     Constitucional
    0.53
     Obrador
    0.52
    Jährige
    0.47
     kardeş
    0.46
    linawan
    0.46
    Act Density 0.089%

    No Known Activations