INDEX
    Explanations

    numerical data and mentions of historical dates

    New Auto-Interp
    Negative Logits
    ANGER
    -0.14
    ãģĵãģĿ
    -0.14
    Χ
    -0.14
    rive
    -0.14
    çĴĥ
    -0.13
    stoi
    -0.13
     Worce
    -0.13
    regor
    -0.13
    OLOR
    -0.13
    ãģ¬
    -0.13
    POSITIVE LOGITS
    194
    0.20
    195
    0.20
    196
    0.19
    fo
    0.18
    190
    0.17
     pr
    0.16
    197
    0.16
    193
    0.16
    189
    0.16
    191
    0.15
    Act Density 0.077%

    No Known Activations