INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Thinking
    -0.07
     thinks
    -0.07
     thinking
    -0.06
     Kids
    -0.06
     savings
    -0.06
     Ministers
    -0.06
     nurse
    -0.06
     Socialist
    -0.06
    Tag
    -0.06
     Warning
    -0.06
    POSITIVE LOGITS
    PCODE
    0.07
    ♀♀
    0.07
    0.07
    ),'
    0.07
    0.07
     gehört
    0.07
    unter
    0.06
    0.06
     наяв
    0.06
     вияв
    0.06
    Act Density 0.020%

    No Known Activations