INDEX
    Explanations

    phrases related to historical events or political ideologies

    New Auto-Interp
    Negative Logits
    é¾įå¥ij士
    -0.80
    Ö¼
    -0.73
    mel
    -0.73
    sburgh
    -0.70
    baugh
    -0.70
    avid
    -0.69
    ensible
    -0.67
    ãĥ¯ãĥ³
    -0.61
     markers
    -0.61
    rored
    -0.61
    POSITIVE LOGITS
    zzi
    1.04
    ÄŁ
    1.00
    qua
    0.95
     Paulo
    0.91
    ji
    0.91
    zeb
    0.91
    vernment
    0.90
    qi
    0.87
    zu
    0.86
     Zed
    0.86
    Act Density 0.025%

    No Known Activations