INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     hogy
    -0.88
    𝙆
    -0.86
     basadas
    -0.83
    -0.79
     некото
    -0.79
     wios
    -0.75
    一下子
    -0.75
     뉴
    -0.73
     seinem
    -0.72
    kuun
    -0.72
    POSITIVE LOGITS
     Wikiped
    0.99
     Give
    0.98
     Let
    0.98
     Ave
    0.96
     DIAM
    0.90
    Für
    0.88
    brano
    0.85
     Come
    0.85
     Ain
    0.84
     我
    0.82
    Act Density 0.078%

    No Known Activations