INDEX
    Explanations

    words related to reflection or mirroring

    references to mirrors and mirror imagery

    New Auto-Interp
    Negative Logits
    --------------------------------------------------------
    -0.80
    CVE
    -0.74
    ensable
    -0.74
    stant
    -0.71
    ndra
    -0.70
    ties
    -0.69
    estern
    -0.68
    iott
    -0.67
    iencies
    -0.67
    tions
    -0.66
    POSITIVE LOGITS
    ror
    1.02
     neuron
    1.00
    ocular
    0.96
     image
    0.95
     mirror
    0.88
    angelo
    0.87
     reflection
    0.87
    image
    0.86
     Mirror
    0.86
     images
    0.85
    Act Density 0.033%

    No Known Activations