INDEX
    Explanations

    words related to contradiction or opposing viewpoints

    New Auto-Interp
    Negative Logits
    iliz
    -0.18
    å¡ļ
    -0.17
    icide
    -0.16
    uba
    -0.15
    ãĥ³ãĤ¯
    -0.15
    istrat
    -0.14
    scribe
    -0.14
    ILA
    -0.14
    ificance
    -0.14
     im
    -0.14
    POSITIVE LOGITS
     contr
    0.30
     CONTR
    0.24
     Contr
    0.21
    ictory
    0.21
    Contr
    0.20
    contr
    0.19
    ition
    0.18
    ived
    0.17
    actions
    0.17
    433
    0.17
    Act Density 0.009%

    No Known Activations