INDEX
    Explanations

    words related to negative or critical contexts

    words or terms related to surprise, anger, and various structured themes or events

    New Auto-Interp
    Negative Logits
     âĨij
    -0.70
     Leilan
    -0.68
     ---------
    -0.67
     Naples
    -0.64
     ãĢĮ
    -0.64
     Eugene
    -0.64
     Florence
    -0.63
    UMP
    -0.61
    FN
    -0.61
    å¿
    -0.60
    POSITIVE LOGITS
    "
    1.10
    "],
    1.09
    "]
    1.01
    "!
    1.00
    ",
    0.98
    ":
    0.97
    tainment
    0.96
    "â̦
    0.94
    ")
    0.94
    "-
    0.93
    Act Density 0.225%

    No Known Activations