INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    agami
    -0.74
    moire
    -0.74
     Stav
    -0.73
    elski
    -0.73
    šak
    -0.73
     Habib
    -0.73
    Sexo
    -0.71
     NHK
    -0.71
    Granada
    -0.70
    エット
    -0.69
    POSITIVE LOGITS
     Confucius
    1.13
     Confucian
    1.01
    Conf
    0.96
    0.87
     孔
    0.86
    0.86
     rectification
    0.85
     ethical
    0.84
     Ethical
    0.82
     Sage
    0.81
    Act Density 0.011%

    No Known Activations