INDEX
    Explanations

    terms related to societal issues and critiques of popular or professional narratives

    adjectives describing characteristics or attributes

    New Auto-Interp
    Negative Logits
    yip
    -0.84
     Collider
    -0.74
    auga
    -0.73
    trak
    -0.72
    uckland
    -0.69
    adelphia
    -0.69
    cients
    -0.68
    pload
    -0.67
    maxwell
    -0.65
     Nare
    -0.64
    POSITIVE LOGITS
     alike
    1.01
     agendas
    0.96
     collaborations
    0.91
     interactions
    0.90
     performances
    0.89
     punishments
    0.89
     architectures
    0.88
     behaviors
    0.88
     interventions
    0.87
     philosophies
    0.86
    Act Density 0.490%

    No Known Activations