INDEX
    Explanations

    terms related to disagreements or dissents

    New Auto-Interp
    Negative Logits
    ìĨĶ
    -0.08
    ussen
    -0.07
    iative
    -0.07
    ÏĥμαÏĦα
    -0.07
    BOSE
    -0.07
    erk
    -0.07
     erb
    -0.07
     tá»Ń
    -0.07
    inand
    -0.07
    aggi
    -0.07
    POSITIVE LOGITS
    ively
    0.11
    ivity
    0.10
    ors
    0.09
    ive
    0.09
    ection
    0.08
     stren
    0.08
     raised
    0.08
    able
    0.08
    535
    0.07
    ives
    0.07
    Act Density 0.004%

    No Known Activations