INDEX
    Explanations

    references to significant historical figures and events

    New Auto-Interp
    Negative Logits
     itself
    -0.20
    quine
    -0.17
     koje
    -0.17
    ibri
    -0.16
    ear
    -0.15
    estroy
    -0.15
    ocale
    -0.15
    å®ĥ们
    -0.15
    ίθ
    -0.15
     stalo
    -0.14
    POSITIVE LOGITS
     himself
    0.31
     whom
    0.28
     his
    0.22
     who
    0.20
    /her
    0.20
     whose
    0.20
     Himself
    0.18
    his
    0.17
     ÙĨÙ쨳Ùĩ
    0.17
     jeho
    0.16
    Act Density 0.460%

    No Known Activations