INDEX
    Explanations

    patterns of zero or non-zero activations indicative of responses to mathematical factorization or divisibility questions

    New Auto-Interp
    Negative Logits
     AssemblyCulture
    -0.85
    OGND
    -0.74
     gynhyrchwyd
    -0.71
    MigrationBuilder
    -0.70
     الرياضيه
    -0.69
    ConstraintMaker
    -0.67
    jspb
    -0.65
     queſta
    -0.63
     zwiſchen
    -0.63
    AddHtmlAttribute
    -0.63
    POSITIVE LOGITS
     again
    0.41
     another
    0.36
    again
    0.35
    Another
    0.34
     yine
    0.33
    0.33
    Again
    0.33
     Larsen
    0.33
    static
    0.32
    還有
    0.32
    Act Density 0.037%

    No Known Activations