INDEX
    Explanations

    proper nouns, specifically names of people

    New Auto-Interp
    Negative Logits
    rements
    -0.58
    ppled
    -0.52
    eing
    -0.52
    timents
    -0.51
    sproz
    -0.49
    verhältnisse
    -0.48
    ajuku
    -0.48
    agerie
    -0.48
    isations
    -0.47
     userManager
    -0.47
    POSITIVE LOGITS
     who
    0.51
     aka
    0.49
    Hentet
    0.46
     Nacionales
    0.42
     himself
    0.40
    who
    0.39
     Jurí
    0.39
    لينكات
    0.38
     dearest
    0.38
     Schwier
    0.38
    Act Density 0.283%

    No Known Activations