INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    monger
    -0.82
    fera
    -0.79
     اح
    -0.79
     cuna
    -0.75
    Pinterest
    -0.71
    𝒜
    -0.69
    гії
    -0.69
     pendidikan
    -0.67
    pinterest
    -0.67
     Handbuch
    -0.67
    POSITIVE LOGITS
     testing
    3.59
     Testing
    2.70
     tests
    2.64
     test
    2.56
    Testing
    2.52
     tested
    2.27
    testing
    2.22
     TESTING
    2.11
     Test
    2.02
     Tests
    2.00
    Act Density 0.011%

    No Known Activations