INDEX
Explanations
dialogue that involves advice and reflection on personal growth or accountability
New Auto-Interp
Negative Logits
fuck
-0.19
fucked
-0.18
fuck
-0.17
FUCK
-0.17
fucks
-0.15
Fucking
-0.15
Fuck
-0.15
fucking
-0.15
rapes
-0.14
Fuck
-0.14
POSITIVE LOGITS
buddy
0.23
partner
0.20
buddies
0.20
fellow
0.18
boss
0.17
brother
0.17
amigo
0.17
accomp
0.17
intimidating
0.17
mentor
0.17
Activations Density 0.158%