INDEX
Explanations
references to social hierarchies and dynamics
New Auto-Interp
Negative Logits
opensource
-0.15
007
-0.14
overall
-0.14
boyc
-0.14
mau
-0.13
artner
-0.13
flesh
-0.13
ran
-0.13
Jas
-0.13
彡
-0.13
POSITIVE LOGITS
thing
0.29
idea
0.27
issue
0.26
concept
0.25
aspect
0.25
situation
0.21
thing
0.21
story
0.21
scenario
0.20
experiment
0.20
Activations Density 0.501%