INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sitios
    -0.92
    tituzione
    -0.82
     digo
    -0.81
    itteet
    -0.80
    -0.79
    Müller
    -0.79
    моро
    -0.78
     pais
    -0.77
    mortar
    -0.76
     ünl
    -0.75
    POSITIVE LOGITS
     forgetting
    2.88
     forgets
    2.86
     forget
    2.75
     overlooks
    2.72
     overlook
    2.42
     забы
    2.34
    forget
    2.27
     forgot
    2.22
     ignores
    2.11
     overlooking
    2.05
    Act Density 0.061%

    No Known Activations