INDEX
    Explanations

    references to Wikipedia

    mentions of Wikipedia and related references

    New Auto-Interp
    Negative Logits
     uncond
    -0.69
    atch
    -0.65
     pressed
    -0.65
    hea
    -0.64
     festive
    -0.61
    erc
    -0.60
     tal
    -0.60
     beads
    -0.59
    ebted
    -0.59
    icip
    -0.59
    POSITIVE LOGITS
     Wikipedia
    3.87
    Wikipedia
    3.27
     wik
    2.57
     Wikimedia
    2.52
    wikipedia
    2.51
     Wik
    2.05
    ipedia
    1.99
     Wiki
    1.98
     wiki
    1.92
    Wik
    1.91
    Act Density 0.020%

    No Known Activations