INDEX
    Explanations

    references to mirrors and reflections

    New Auto-Interp
    Negative Logits
    ÑĥÑĩ
    -0.15
    .Throw
    -0.15
    ury
    -0.15
    wire
    -0.15
    795
    -0.14
    iform
    -0.14
    wdx
    -0.14
    ãĥĬãĥ¼
    -0.14
    êt
    -0.14
    моÑĢ
    -0.14
    POSITIVE LOGITS
    reflection
    0.22
     reflection
    0.19
    inati
    0.17
    mirror
    0.16
     vanity
    0.16
    irror
    0.16
     reflections
    0.16
     mirror
    0.16
     Reflection
    0.15
     Self
    0.15
    Act Density 0.045%

    No Known Activations