INDEX
    Explanations

    text related to philosophical, political, or historical contexts

    New Auto-Interp
    Negative Logits
     predec
    -0.70
     buggy
    -0.70
     scatter
    -0.67
     stricken
    -0.66
     lodging
    -0.65
     shroud
    -0.64
     clad
    -0.64
     neglig
    -0.63
     closest
    -0.63
     decomp
    -0.63
    POSITIVE LOGITS
    º
    1.23
    £
    1.09
    ¹
    1.07
    ¡
    0.94
    ®
    0.94
    į
    0.91
    ¬
    0.91
    »
    0.90
    Ĵ
    0.89
    Ń
    0.88
    Act Density 0.236%

    No Known Activations