INDEX
Explanations
phrases related to power dynamics and hierarchy
phrases related to societal roles and dynamics
New Auto-Interp
Negative Logits
ancies
-0.61
anton
-0.61
disclaim
-0.61
catast
-0.60
leigh
-0.59
ensions
-0.57
laws
-0.57
imar
-0.57
irin
-0.57
ritz
-0.57
POSITIVE LOGITS
fodder
0.84
extraord
0.76
unto
0.69
alongside
0.67
.(
0.67
asset
0.66
amidst
0.65
beacon
0.64
forever
0.62
centerpiece
0.62
Activations Density 0.480%