INDEX
    Explanations

    words related to misconceptions, false beliefs, and deception

    concepts related to illusions and delusions

    New Auto-Interp
    Negative Logits
    atories
    -0.72
    RC
    -0.69
    uter
    -0.65
    ĵ
    -0.65
    iary
    -0.64
    utor
    -0.64
    ï¸ı
    -0.63
    RAFT
    -0.62
    rc
    -0.62
    received
    -0.61
    POSITIVE LOGITS
     illusion
    3.28
     illusions
    2.99
     Illusion
    2.32
     delusion
    2.02
    illusion
    1.85
     delusions
    1.60
     mir
    1.31
     hallucinations
    1.29
     impressions
    1.27
     halluc
    1.23
    Act Density 0.029%

    No Known Activations