INDEX
    Explanations

    historical and philosophical terms, potentially related to a specific period or individual in history

    New Auto-Interp
    Negative Logits
     Carbuncle
    -0.78
    DOWN
    -0.77
     Nadu
    -0.72
     Narr
    -0.68
     eleph
    -0.67
     GEAR
    -0.66
     Columbia
    -0.65
    worthy
    -0.64
     Dwell
    -0.63
    uyomi
    -0.62
    POSITIVE LOGITS
    vered
    1.09
    cking
    1.08
    lder
    1.06
    elin
    1.04
    els
    1.03
    ulner
    1.01
    ck
    1.00
    eling
    0.96
    nder
    0.95
    clair
    0.94
    Act Density 6.736%

    No Known Activations