INDEX
    Explanations

    dates written in a specific format (month, day, year) combined with specific usernames

    New Auto-Interp
    Negative Logits
     expansions
    -0.85
     detectors
    -0.76
     predec
    -0.72
     successors
    -0.68
     expansion
    -0.67
     connections
    -0.66
     unnecess
    -0.66
    avorite
    -0.65
     glim
    -0.64
     superiors
    -0.64
    POSITIVE LOGITS
    000
    0.95
     2017
    0.90
     2016
    0.89
     2015
    0.88
     2018
    0.85
    05
    0.85
     2014
    0.84
     2012
    0.82
    080
    0.81
    2010
    0.81
    Act Density 0.051%

    No Known Activations