INDEX
    Explanations

    references to a specific individual or variations of their name

    New Auto-Interp
    Negative Logits
    herit
    -0.18
    inois
    -0.17
    ury
    -0.15
    utor
    -0.15
    ADF
    -0.15
    rone
    -0.14
    ems
    -0.14
    ings
    -0.14
    oria
    -0.14
    ple
    -0.14
    POSITIVE LOGITS
     adj
    0.17
    lected
    0.15
    assa
    0.15
    adir
    0.15
    аÑģÑģ
    0.14
    adj
    0.14
    ingleton
    0.14
    enek
    0.14
    åģ
    0.14
     buck
    0.13
    Act Density 0.018%

    No Known Activations