INDEX
    Explanations

    the word "topic" followed by a number

    references to specific subjects or themes in various contexts

    New Auto-Interp
    Negative Logits
    ramid
    -0.73
    ardo
    -0.70
    ignt
    -0.68
    igned
    -0.67
    othy
    -0.66
    alty
    -0.66
    hovah
    -0.66
    xon
    -0.64
    ATES
    -0.63
    arus
    -0.62
    POSITIVE LOGITS
     topics
    0.96
     Topics
    0.88
    matter
    0.87
    Topic
    0.83
     topic
    0.82
    topic
    0.79
     debated
    0.77
    Topics
    0.77
    afety
    0.77
    icular
    0.73
    Act Density 0.027%

    No Known Activations