INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    անո
    -0.83
    proposal
    -0.78
     Kraw
    -0.78
     energías
    -0.77
     shield
    -0.74
     salesman
    -0.74
     песен
    -0.72
    deletion
    -0.72
    oglio
    -0.71
     vitamin
    -0.71
    POSITIVE LOGITS
     ballet
    2.80
     Ballet
    2.20
     dancers
    2.05
     ballerina
    1.95
     dancer
    1.95
    ballet
    1.86
     baller
    1.66
    🩰
    1.48
     dance
    1.47
    バレ
    1.45
    Act Density 0.028%

    No Known Activations