INDEX
    Explanations

    information and references from Wikipedia

    New Auto-Interp
    Negative Logits
     charism
    -0.75
    pter
    -0.74
    cffffcc
    -0.73
    stra
    -0.71
    rone
    -0.69
    sbm
    -0.69
    taboola
    -0.69
     Bethlehem
    -0.68
    eping
    -0.68
    ayed
    -0.65
    POSITIVE LOGITS
    ipedia
    1.38
     Commons
    1.13
     encyclopedia
    1.00
    pedia
    0.98
     Wikipedia
    0.93
    wiki
    0.89
    Leaks
    0.88
    Wikipedia
    0.86
     Template
    0.85
     edits
    0.84
    Act Density 0.009%

    No Known Activations