INDEX
    Explanations

    words associated with reflection or representation

    New Auto-Interp
    Negative Logits
    stant
    -0.76
    iott
    -0.75
    aii
    -0.74
    ensable
    -0.71
    uilt
    -0.71
    --------------------------------------------------------
    -0.70
    yright
    -0.69
    ccoli
    -0.68
    CVE
    -0.68
    TAIN
    -0.68
    POSITIVE LOGITS
    ror
    1.01
     neuron
    0.94
     image
    0.86
    angelo
    0.83
     mirror
    0.82
     Mirror
    0.80
     neurons
    0.79
     shards
    0.78
    ing
    0.77
    ocular
    0.76
    Act Density 0.050%

    No Known Activations