INDEX
    Explanations

    words related to long-term patterns or changes

    references to trends or patterns over time

    New Auto-Interp
    Negative Logits
    ned
    -0.83
    oÄŁ
    -0.83
    gha
    -0.79
    ded
    -0.75
    INGTON
    -0.71
    \\\\\\\\
    -0.68
    unts
    -0.67
    lain
    -0.66
    ×ŀ
    -0.65
    \\\\\\\\\\\\\\\\
    -0.65
    POSITIVE LOGITS
    etting
    1.06
    ettings
    1.03
    etter
    1.01
    uggest
    0.96
    omething
    0.94
    afety
    0.92
    ynt
    0.92
     trends
    0.91
    hooting
    0.90
    hips
    0.89
    Act Density 0.034%

    No Known Activations