INDEX
    Explanations

    phrases related to philosophical and political discussions

    New Auto-Interp
    Negative Logits
     undet
    -0.76
     neighb
    -0.74
     censored
    -0.74
     coral
    -0.74
     lifes
    -0.73
     spitting
    -0.73
     que
    -0.73
     ability
    -0.73
     bid
    -0.72
     bloc
    -0.72
    POSITIVE LOGITS
    Advertisement
    1.86
    Advertisements
    1.62
    Anyway
    1.61
    Related
    1.60
    However
    1.60
    What
    1.59
    Because
    1.58
    But
    1.58
    Unfortunately
    1.58
    Nevertheless
    1.57
    Act Density 1.464%

    No Known Activations