INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    {name
    -0.08
    		               
    -0.07
    	damage
    -0.07
    áfico
    -0.07
    Ash
    -0.07
    Alexander
    -0.07
     sons
    -0.07
     Πλη
    -0.07
     конф
    -0.07
     Vand
    -0.07
    POSITIVE LOGITS
     decree
    0.13
     decre
    0.10
     scre
    0.08
    UU
    0.06
     enact
    0.06
    .clearRect
    0.06
    cw
    0.06
    0.06
    gree
    0.06
    emma
    0.06
    Act Density 0.001%

    No Known Activations