INDEX
    Explanations

    references to historical figures and significant events related to culture

    New Auto-Interp
    Negative Logits
    erti
    -0.16
    ãģ¾ãģļ
    -0.15
    ằm
    -0.14
    ichtet
    -0.14
    uish
    -0.14
    abwe
    -0.13
    zdy
    -0.13
    argar
    -0.13
     Newest
    -0.13
    à¹ĥà¸Ļส
    -0.13
    POSITIVE LOGITS
     still
    0.66
    still
    0.57
     STILL
    0.56
    Still
    0.53
     Still
    0.53
     hâlâ
    0.40
    ä»į
    0.40
     now
    0.40
     ainda
    0.40
     continues
    0.38
    Act Density 0.448%

    No Known Activations