INDEX
    Explanations

    consider historical context

    New Auto-Interp
    Negative Logits
     wiki
    0.68
     wikipedia
    0.66
     Wikidata
    0.65
     wik
    0.61
     Wikipedia
    0.58
     вікі
    0.56
     ویکی
    0.54
     Wiki
    0.53
     wikip
    0.49
     Wikimedia
    0.49
    POSITIVE LOGITS
    0.76
    .—
    0.68
    0.66
    −−
    0.66
     --
    0.66
    ?—
    0.65
     ——
    0.63
    .--
    0.60
     —,
    0.59
     User
    0.56
    Act Density 0.003%

    No Known Activations