INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fewer
    -0.07
    -0.07
    üst
    -0.06
     secretary
    -0.06
     chir
    -0.06
    ISK
    -0.06
    IIIK
    -0.06
    ASI
    -0.06
     Levy
    -0.06
    318
    -0.06
    POSITIVE LOGITS
     Rome
    0.15
     Roma
    0.10
     Rom
    0.09
     Pompe
    0.09
     Romans
    0.07
     rom
    0.07
    Rom
    0.07
     Ring
    0.07
    me
    0.07
     Magnetic
    0.07
    Act Density 0.007%

    No Known Activations