INDEX
    Explanations

    words related to political discussions and actions

    New Auto-Interp
    Negative Logits
    hof
    -0.17
    ãĥģ
    -0.15
     Barber
    -0.15
    py
    -0.15
    iek
    -0.14
     cap
    -0.14
    dbus
    -0.14
    etler
    -0.13
    609
    -0.13
    /release
    -0.13
    POSITIVE LOGITS
    wner
    0.17
    wer
    0.16
     tượng
    0.15
    erif
    0.15
     patron
    0.15
    WARE
    0.15
    è¯Ŀ
    0.15
    WER
    0.14
     TMPro
    0.14
    arl
    0.14
    Act Density 0.018%

    No Known Activations