INDEX
Explanations
references to emotional manipulation and accountability in relationships
New Auto-Interp
Negative Logits
sed
-0.20
amba
-0.19
ston
-0.19
Ston
-0.19
Sed
-0.17
785
-0.16
sust
-0.15
Wagner
-0.15
ijo
-0.15
eldon
-0.15
POSITIVE LOGITS
Sarah
1.11
sar
1.00
Sarah
0.99
SAR
0.92
Sar
0.85
sar
0.76
Sara
0.67
Saras
0.51
Sanders
0.50
Darwin
0.50
Activations Density 0.124%