INDEX
    Explanations

    controversial topics or statements

    references to controversial topics or issues

    New Auto-Interp
    Negative Logits
    vation
    -0.89
    elsen
    -0.87
    nings
    -0.78
    á
    -0.78
    strings
    -0.76
    hower
    -0.75
    abetic
    -0.73
    minster
    -0.73
    ruary
    -0.72
    abetes
    -0.72
    POSITIVE LOGITS
     aspects
    0.92
     topic
    0.91
     topics
    0.90
    ity
    0.87
     proposition
    0.83
     opinions
    0.83
     viewpoints
    0.82
     views
    0.81
     aspect
    0.81
     fringe
    0.80
    Act Density 0.078%

    No Known Activations