INDEX
Explanations
references to caring for and expressing love towards someone
references to a specific female subject and her relationship with others
New Auto-Interp
Negative Logits
ACP
-0.89
arios
-0.73
irm
-0.73
ocument
-0.68
ãĥĩãĤ£
-0.67
scrib
-0.66
ffic
-0.63
omination
-0.62
Hasan
-0.60
adobe
-0.60
POSITIVE LOGITS
Majesty
1.13
POV
0.94
majesty
0.91
psyche
0.73
âĹ¼
0.67
backstory
0.62
reluct
0.60
motivation
0.59
iets
0.59
psychology
0.57
Activations Density 0.303%