INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.09
     TJ
    -0.08
     Thomas
    -0.07
    patch
    -0.07
     Assad
    -0.07
    Thomas
    -0.07
     kezel
    -0.07
     hiv
    -0.07
     צד
    -0.07
    ummings
    -0.07
    POSITIVE LOGITS
    -European
    0.10
    -Europe
    0.09
    -hours
    0.09
     PMC
    0.08
     frü
    0.08
     scream
    0.08
     Scholar
    0.08
     screaming
    0.08
     سرچ
    0.08
     лит
    0.08
    Act Density 0.002%

    No Known Activations