INDEX
    Explanations

    discussions about social and political issues

    New Auto-Interp
    Negative Logits
    istically
    -1.16
    istic
    -0.96
    ists
    -0.86
    ism
    -0.85
    istical
    -0.84
    aries
    -0.79
    ist
    -0.78
    isers
    -0.76
    izes
    -0.74
    isation
    -0.69
    POSITIVE LOGITS
    riter
    1.15
    ards
    1.11
    bour
    1.07
    ARDS
    1.02
    flake
    1.01
    cloth
    0.97
    dh
    0.96
    pot
    0.95
    intosh
    0.94
    sie
    0.92
    Act Density 3.415%

    No Known Activations