INDEX
    Explanations

    instances of the word "honor" in various forms

    New Auto-Interp
    Negative Logits
    ors
    -0.30
    or
    -0.17
    ionate
    -0.17
    tas
    -0.16
     Pes
    -0.15
    tul
    -0.15
    odo
    -0.15
    ODO
    -0.15
    tan
    -0.15
    t
    -0.14
    POSITIVE LOGITS
    cho
    0.22
    chos
    0.19
    ed
    0.19
    kins
    0.18
    ester
    0.17
    edin
    0.17
    zos
    0.17
    olulu
    0.17
    TRL
    0.16
    obo
    0.16
    Act Density 0.006%

    No Known Activations