INDEX
Explanations
mentions of close relationships and strong emotional connections
references to friendship and social relationships
New Auto-Interp
Negative Logits
overe
-0.78
vertisement
-0.76
authorized
-0.72
ucks
-0.71
itial
-0.68
inis
-0.67
informed
-0.67
odo
-0.65
ijing
-0.63
deliber
-0.62
POSITIVE LOGITS
Romeo
0.87
Valerie
0.77
lier
0.75
hips
0.74
liness
0.72
Draco
0.71
Huma
0.71
Giul
0.68
Flavoring
0.67
Tad
0.67
Activations Density 0.104%