INDEX
Explanations
terms related to societal issues and critiques of popular or professional narratives
adjectives describing characteristics or attributes
New Auto-Interp
Negative Logits
yip
-0.84
Collider
-0.74
auga
-0.73
trak
-0.72
uckland
-0.69
adelphia
-0.69
cients
-0.68
pload
-0.67
maxwell
-0.65
Nare
-0.64
POSITIVE LOGITS
alike
1.01
agendas
0.96
collaborations
0.91
interactions
0.90
performances
0.89
punishments
0.89
architectures
0.88
behaviors
0.88
interventions
0.87
philosophies
0.86
Activations Density 0.490%