INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Deutschland
    -0.07
     gep
    -0.06
     και
    -0.06
     comentarios
    -0.06
    carrier
    -0.06
     öne
    -0.06
    _mock
    -0.06
    isers
    -0.06
     všech
    -0.06
    helpers
    -0.06
    POSITIVE LOGITS
     detecting
    0.08
     detect
    0.07
    qu
    0.07
     copy
    0.07
     XI
    0.06
    0.06
    	find
    0.06
    ’я
    0.06
     normalized
    0.06
    ---
    ↵
    0.06
    Act Density 0.006%

    No Known Activations