INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     myſelf
    -1.01
     تضيفلها
    -0.99
     themſelves
    -0.98
     Eſ
    -0.97
     purpoſe
    -0.94
     itſelf
    -0.93
    theless
    -0.89
     Transparency
    -0.86
     himſelf
    -0.85
     poffible
    -0.85
    POSITIVE LOGITS
     Dog
    1.90
     dog
    1.89
    Dog
    1.82
     dogs
    1.80
     DOG
    1.68
    dog
    1.63
     Dogs
    1.62
    Dogs
    1.49
    DOG
    1.44
     DOGS
    1.40
    Act Density 0.045%

    No Known Activations