INDEX
Explanations
specific pronouns and verb forms related to ownership or obligation
New Auto-Interp
Negative Logits
Ĥİ
-0.72
eta
-0.64
assisted
-0.64
ufact
-0.63
eros
-0.63
inqu
-0.62
ascert
-0.60
Recomm
-0.60
ensured
-0.60
angan
-0.59
POSITIVE LOGITS
firsthand
0.89
VIDEOS
0.87
unfold
0.86
therapist
0.82
similarities
0.82
resemblance
0.79
replay
0.76
psychiatrist
0.74
similarity
0.74
positives
0.73
Activations Density 0.205%