INDEX
    Explanations

    discussions around contentious social and political topics

    New Auto-Interp
    Negative Logits
     anymore
    -0.19
    ilde
    -0.16
    raison
    -0.16
    urent
    -0.16
    aris
    -0.16
    åĨį
    -0.15
    èªł
    -0.15
    iek
    -0.15
     Everyday
    -0.15
     Again
    -0.14
    POSITIVE LOGITS
     previously
    0.48
     lately
    0.45
     recently
    0.42
     previous
    0.38
     before
    0.38
     Previously
    0.34
     elsewhere
    0.34
    Previously
    0.31
    Previous
    0.31
    before
    0.30
    Act Density 0.344%

    No Known Activations