INDEX
    Explanations

    words related to opposition or disagreement

    references to opposing viewpoints or positions

    New Auto-Interp
    Negative Logits
    Interstitial
    -0.80
    çīĪ
    -0.66
     Learns
    -0.62
    enegger
    -0.62
    KER
    -0.61
     Fork
    -0.61
     Carnival
    -0.61
     Fallen
    -0.60
    beit
    -0.59
    negie
    -0.59
    POSITIVE LOGITS
    onent
    1.65
    osite
    1.63
    ortun
    1.60
    osition
    1.51
    onents
    1.48
    osing
    1.29
    ressive
    1.23
    ression
    1.20
    osed
    1.17
    inion
    1.16
    Act Density 0.016%

    No Known Activations