INDEX
    Explanations

    sources or places where information is conveyed, such as news outlets or interviews

    mentions of news organizations and their publications

    New Auto-Interp
    Negative Logits
     veter
    -0.64
    animate
    -0.59
     perfected
    -0.58
     diaper
    -0.58
     causal
    -0.57
    atible
    -0.56
     harms
    -0.55
     perman
    -0.54
     underestimated
    -0.52
     parity
    -0.51
    POSITIVE LOGITS
    .
    0.82
    .</
    0.72
     quoted
    0.71
     quoting
    0.69
    .).
    0.68
     referring
    0.67
    lied
    0.66
    ."
    0.64
    ).
    0.64
    ].
    0.63
    Act Density 0.216%

    No Known Activations