INDEX
Explanations
mentions of romantic or sexual relationships
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.06
3:0.08
4:0.12
5:0.03
6:0.03
7:0.27
8:0.03
9:0.03
10:0.18
11:0.08
Negative Logits
////////
-1.71
icter
-1.59
matchup
-1.53
concise
-1.51
¯¯
-1.48
Clear
-1.47
////////////////
-1.45
unders
-1.45
auntlet
-1.44
ampions
-1.43
POSITIVE LOGITS
uranium
1.48
Meg
1.38
Lago
1.35
Kare
1.34
rumors
1.33
Twice
1.30
Malfoy
1.29
666
1.29
Freddie
1.28
favors
1.27
Activations Density 0.001%