INDEX
    Explanations

    phrases related to political ideology and criticism, particularly focusing on deception and manipulation of information

    New Auto-Interp
    Negative Logits
    zens
    -0.64
     Sob
    -0.64
    gra
    -0.62
    rawdownloadcloneembedreportprint
    -0.61
    ivid
    -0.59
    stown
    -0.59
    chenko
    -0.58
     Survey
    -0.58
    abulary
    -0.58
    pora
    -0.58
    POSITIVE LOGITS
     innocuous
    0.87
     innocence
    0.80
     invincible
    0.79
     UL
    0.76
    OPA
    0.72
     benign
    0.65
     harmless
    0.65
     neutrality
    0.62
     glamorous
    0.61
     Cure
    0.61
    Act Density 16.267%

    No Known Activations