INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     beina
    -0.41
    PhysRev
    -0.38
     préfé
    -0.38
     derfor
    -0.38
     ſtate
    -0.37
    因而
    -0.36
     مشين
    -0.35
    ambién
    -0.35
     juſ
    -0.35
     avoient
    -0.35
    POSITIVE LOGITS
    .*")]
    0.75
    "}";
    0.71
    ;";
    0.67
    ;';
    0.66
    )';
    0.65
    ]';
    0.65
    }');
    0.63
    }>;
    0.63
    "]));
    0.61
    ::*;
    0.61
    Act Density 0.019%

    No Known Activations