INDEX
    Explanations

    significant years or dates written in the format of two numbers separated by a dash and ending in a zero followed by other numbers

    specific years or numerical dates in the text

    New Auto-Interp
    Negative Logits
    irlf
    -0.72
    lying
    -0.69
     bunny
    -0.68
     gren
    -0.67
     peeled
    -0.66
     tremend
    -0.64
     hottest
    -0.64
     brightest
    -0.64
     holiday
    -0.63
    igating
    -0.63
    POSITIVE LOGITS
    90
    1.11
    504
    0.98
    74
    0.97
    88
    0.97
    91
    0.97
    65
    0.97
    70
    0.96
    80
    0.96
    85
    0.95
    95
    0.94
    Act Density 0.038%

    No Known Activations