INDEX
Explanations
discussing ethical relationships and transformation
New Auto-Interp
Negative Logits
Romantic
0.48
romantic
0.41
💏
0.40
Romantic
0.39
legality
0.39
romantic
0.39
porn
0.39
合法
0.39
Paths
0.38
bloodshed
0.38
POSITIVE LOGITS
overcoming
0.64
overcame
0.63
overcome
0.60
overcomes
0.56
克服
0.55
positive
0.55
interesting
0.53
преодо
0.51
friendly
0.50
solving
0.50
Activations Density 0.122%