INDEX
    Explanations

    Copilot, sales pitches, internet knowledge

    New Auto-Interp
    Negative Logits
    ان
    0.53
    0.49
    0.47
    0.46
    expensive
    0.46
    hundred
    0.45
    0.45
    ک
    0.45
    ப்பில்
    0.44
    λ
    0.44
    POSITIVE LOGITS
     odio
    0.51
     x
    0.49
     velké
    0.49
     aquello
    0.48
     getir
    0.47
     lector
    0.47
     feitos
    0.47
     diversas
    0.46
     ix
    0.46
     queso
    0.46
    Act Density 0.007%

    No Known Activations