INDEX
    Explanations

    proper nouns, especially people.

    words related to familial relationships and lineage

    New Auto-Interp
    Negative Logits
    InjectAttribute
    -0.85
     AssemblyProduct
    -0.83
     nakalista
    -0.78
     للمعارف
    -0.77
    ########.
    -0.77
     يتيمه
    -0.75
    RegressionTest
    -0.73
     ſever
    -0.72
     myſelf
    -0.71
    Rohy
    -0.71
    POSITIVE LOGITS
     of
    0.55
    0.50
    üs
    0.50
    of
    0.47
    ah
    0.47
    el
    0.47
    nas
    0.46
    sius
    0.45
    AH
    0.44
     (
    0.44
    Act Density 0.903%

    No Known Activations