INDEX
    Explanations

    claims and misconceptions about various topics, particularly health and societal issues

    New Auto-Interp
    Negative Logits
    ết
    -0.16
     TODO
    -0.14
    igits
    -0.14
    ourmet
    -0.14
     åĵģ
    -0.14
    zzo
    -0.13
    ænd
    -0.13
    cef
    -0.13
     Classified
    -0.13
    orsche
    -0.13
    POSITIVE LOGITS
     myths
    0.42
     myth
    0.41
     Myth
    0.36
     perception
    0.33
     perceptions
    0.31
     stereotypes
    0.30
     mythology
    0.29
     commonly
    0.27
     false
    0.27
     miscon
    0.27
    Act Density 0.345%

    No Known Activations