INDEX
    Explanations

    positive and negative descriptive terms

    words related to the concept of surprise or unexpectedness

    New Auto-Interp
    Negative Logits
    FN
    -0.67
     Kik
    -0.64
     Shepard
    -0.64
     ---------
    -0.63
     Naples
    -0.63
     Leilan
    -0.62
     Bret
    -0.62
     Florence
    -0.60
    Nap
    -0.60
     ãĤ
    -0.59
    POSITIVE LOGITS
    "
    1.02
    "!
    0.97
    "],
    0.93
    "â̦
    0.92
    terday
    0.91
    tainment
    0.91
    ",
    0.90
    "]
    0.90
    ":
    0.90
    usterity
    0.86
    Act Density 0.229%

    No Known Activations