INDEX
Explanations
phrases related to societal issues and controversies
discussions surrounding social issues and critiques of societal behaviors
New Auto-Interp
Negative Logits
stabilized
-0.71
recovered
-0.71
reunited
-0.68
Transition
-0.63
repaired
-0.62
awaited
-0.61
ocamp
-0.61
matured
-0.61
Fixed
-0.61
iscovered
-0.61
POSITIVE LOGITS
ignores
1.40
violates
1.33
undermines
1.21
shouldn
1.13
contradicts
1.13
justifies
1.01
ought
0.97
begs
0.96
doesnt
0.95
constitutes
0.95
Activations Density 0.679%