INDEX
    Explanations

    mentions of political events and figures

    references to political events or figures, particularly concerning campaigns or manifestos

    New Auto-Interp
    Negative Logits
    aughs
    -0.83
    imer
    -0.77
    anus
    -0.75
    verages
    -0.73
    heses
    -0.73
    ndum
    -0.73
    witz
    -0.72
    gars
    -0.72
    ilion
    -0.71
    peria
    -0.70
    POSITIVE LOGITS
    æ©
    0.81
     domain
    0.72
     domains
    0.72
    èĪ
    0.70
    )=(
    0.67
    ãģ®ç
    0.66
    entimes
    0.65
     strawberries
    0.65
     è£ıè
    0.65
    ãģ®å
    0.64
    Act Density 0.137%

    No Known Activations