INDEX
    Explanations

    mentions of specific geographic locations and proper nouns

    New Auto-Interp
    Negative Logits
    iros
    -0.19
    omor
    -0.14
    ictory
    -0.14
    rin
    -0.14
    oded
    -0.14
    ä¹IJ
    -0.14
    pta
    -0.13
     judgement
    -0.13
    ãĥ³ãĥIJ
    -0.13
    roken
    -0.13
    POSITIVE LOGITS
    uth
    0.27
    UTH
    0.20
    wich
    0.19
    oxetine
    0.18
    les
    0.17
    ces
    0.17
    mage
    0.16
    umb
    0.15
    quer
    0.15
    rosse
    0.15
    Act Density 0.004%

    No Known Activations