INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     hidden
    -1.48
    hidden
    -1.39
    Hidden
    -1.10
     Hidden
    -1.10
     forgotten
    -1.08
     concealed
    -1.07
     verborgen
    -0.94
    forgotten
    -0.94
     oculto
    -0.90
     oculta
    -0.84
    POSITIVE LOGITS
    ness
    1.11
    NESS
    0.72
    nesses
    0.59
    EDEFAULT
    0.56
     Drapeau
    0.56
    enix
    0.54
    NOPQRST
    0.54
    estinal
    0.54
    "?>
    0.53
    dehyde
    0.51
    Act Density 0.171%

    No Known Activations