INDEX
    Explanations

    words related to distinguishing reality from misinformation

    New Auto-Interp
    Negative Logits
     Fishing
    -0.15
    olist
    -0.15
    oli
    -0.14
     riv
    -0.14
     earth
    -0.14
    ator
    -0.14
    coder
    -0.14
    ventus
    -0.13
     Retro
    -0.13
    enz
    -0.13
    POSITIVE LOGITS
     factual
    0.19
    <quote
    0.17
     unin
    0.17
     facts
    0.16
    actual
    0.15
    chants
    0.15
    facts
    0.15
    opak
    0.14
     falsehood
    0.14
     distortion
    0.14
    Act Density 0.360%

    No Known Activations