INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fron
    -0.07
    .unwrap
    -0.07
     больш
    -0.07
     وز
    -0.07
     profound
    -0.06
    ero
    -0.06
     experienced
    -0.06
     terrible
    -0.06
    	synchronized
    -0.06
     yanında
    -0.06
    POSITIVE LOGITS
    gom
    0.06
     Murphy
    0.06
     Alexandra
    0.06
    онах
    0.06
    iday
    0.06
     oxidation
    0.06
    ("~/
    0.06
     contacting
    0.06
    zx
    0.06
    زم
    0.06
    Act Density 0.003%

    No Known Activations