INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     scp
    -0.78
     Sae
    -0.75
     Daria
    -0.75
     Jiao
    -0.74
     purpoſe
    -0.71
     Arp
    -0.68
     Cæsar
    -0.67
    selves
    -0.66
    ണം
    -0.66
     onOptions
    -0.65
    POSITIVE LOGITS
     With
    1.80
     WITH
    1.80
     with
    1.73
    with
    1.68
    With
    1.59
    WITH
    1.55
     avec
    1.45
     Avec
    1.42
    Avec
    1.34
     עם
    1.21
    Act Density 0.427%

    No Known Activations