INDEX
Explanations
compliments and positive feedback
mentions of personal attributes or opinions related to people
New Auto-Interp
Negative Logits
actionDate
-0.73
kefeller
-0.66
Mehran
-0.66
srfAttach
-0.65
lifelong
-0.65
ptive
-0.64
eros
-0.60
ordinary
-0.59
¬¼
-0.58
女
-0.57
POSITIVE LOGITS
seem
1.37
seems
1.21
seemed
1.21
mentioned
1.01
clearly
0.99
hinted
0.98
evidently
0.97
wisely
0.97
sounded
0.94
kindly
0.94
Activations Density 0.721%