INDEX
    Explanations

    phrases related to deception or illusion

    terms related to metaphorical representations and abstract concepts

    New Auto-Interp
    Negative Logits
    reditary
    -0.98
    igree
    -0.88
    rences
    -0.84
    aeper
    -0.83
    arya
    -0.78
    ogun
    -0.77
    uled
    -0.76
    gments
    -0.76
    raviolet
    -0.76
    uilt
    -0.75
    POSITIVE LOGITS
    女
    0.93
    ãĥ¢
    0.80
    fish
    0.80
    phony
    0.74
    ãĥķãĤ¡
    0.71
     gad
    0.70
     fig
    0.68
    crop
    0.66
    ãĥ¡
    0.65
    ————————————————
    0.64
    Act Density 0.028%

    No Known Activations