INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Carthag
    -0.45
     PPS
    -0.43
     fep
    -0.43
    -0.43
     Liverpool
    -0.42
     Gamer
    -0.42
    tomas
    -0.42
     Yellow
    -0.41
     ricks
    -0.41
     Napoleon
    -0.41
    POSITIVE LOGITS
     grace
    1.92
    grace
    1.59
     gracia
    1.42
     GRACE
    1.41
     grazia
    1.37
    Grace
    1.34
     Grace
    1.34
     graces
    1.32
     graça
    1.31
    GRACE
    1.24
    Act Density 0.005%

    No Known Activations