INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     scam
    -0.07
     accru
    -0.06
    endale
    -0.06
     correspond
    -0.06
     assure
    -0.06
     Nom
    -0.06
     Carrie
    -0.06
     Carr
    -0.06
     USC
    -0.06
     başına
    -0.06
    POSITIVE LOGITS
     Light
    0.18
     light
    0.18
    Light
    0.17
     LIGHT
    0.14
     lights
    0.13
    light
    0.13
     Lights
    0.12
    LIGHT
    0.12
    	light
    0.12
    Lights
    0.11
    Act Density 0.038%

    No Known Activations