INDEX
    Explanations

    information related to historical figures and their roles

    New Auto-Interp
    Negative Logits
    ολ
    -0.15
    calar
    -0.15
    erge
    -0.15
    ocratic
    -0.15
     addCriterion
    -0.14
     downt
    -0.14
    oden
    -0.14
    imer
    -0.14
    ımızda
    -0.14
    hatt
    -0.14
    POSITIVE LOGITS
    ige
    0.19
    stell
    0.17
     Initi
    0.17
     borderTop
    0.17
     Che
    0.16
    idge
    0.16
    quier
    0.15
    ạ
    0.14
     Refer
    0.14
    lang
    0.14
    Act Density 0.028%

    No Known Activations