INDEX
Explanations
references to theatrical or performance contexts
New Auto-Interp
Negative Logits
amient
-0.08
556
-0.07
ness
-0.07
NESS
-0.07
.Alignment
-0.07
bakan
-0.07
ambi
-0.07
lope
-0.07
GenerationStrategy
-0.07
AMPL
-0.07
POSITIVE LOGITS
presence
0.12
Presence
0.12
Presence
0.10
presence
0.10
persona
0.10
persona
0.08
Persona
0.08
yb
0.07
personality
0.07
personas
0.06
Activations Density 0.007%