INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    lasses
    -0.92
    ipeg
    -0.88
    manship
    -0.85
    rah
    -0.83
    \\\\\\\\
    -0.82
    rill
    -0.81
    lessness
    -0.80
    ridges
    -0.78
    ingham
    -0.77
    ramer
    -0.76
    POSITIVE LOGITS
    zzi
    1.12
     Rossi
    0.98
    ucci
    0.95
     Galile
    0.88
    zzo
    0.86
     Giul
    0.85
    otti
    0.85
    etta
    0.84
     Luigi
    0.84
     Giovanni
    0.84
    Act Density 1.156%

    No Known Activations