INDEX
    Explanations

    words related to mirrors and reflections

    New Auto-Interp
    Negative Logits
    xxxxxxxx
    -0.70
    doi
    -0.66
    mable
    -0.64
    enery
    -0.63
     Carth
    -0.63
    Fra
    -0.63
     Calories
    -0.62
    Ī
    -0.61
     Vi
    -0.61
    æ©
    -0.61
    POSITIVE LOGITS
    ror
    1.73
    etheless
    1.05
    ROR
    0.92
    rors
    0.92
    hin
    0.83
    cffff
    0.82
    terday
    0.80
     guiActiveUn
    0.79
    adeon
    0.78
    bably
    0.77
    Act Density 0.011%

    No Known Activations