INDEX
    Explanations

    references to individuals and their characteristics or societal roles

    New Auto-Interp
    Negative Logits
    kker
    -0.16
    lew
    -0.15
     stuff
    -0.14
     Honor
    -0.14
    agne
    -0.13
     logic
    -0.13
    ving
    -0.13
    eri
    -0.13
    ordo
    -0.13
    olib
    -0.13
    POSITIVE LOGITS
     known
    0.43
    known
    0.42
     Known
    0.41
    Known
    0.38
    -known
    0.35
    _known
    0.33
     извеÑģÑĤ
    0.29
     famous
    0.26
     bekannt
    0.26
    KNOWN
    0.25
    Act Density 0.006%

    No Known Activations