INDEX
Explanations
personal pronouns and verbs related to knowledge or awareness
instances of personal experiences and reflections related to emotional or moral dilemmas
New Auto-Interp
Negative Logits
Adds
-0.60
ggles
-0.59
Adds
-0.57
Written
-0.55
ebin
-0.53
attm
-0.52
Flavoring
-0.52
shrug
-0.52
Recently
-0.52
sigh
-0.51
POSITIVE LOGITS
mattered
1.57
was
1.46
resembled
1.45
depended
1.43
belonged
1.43
lacked
1.42
was
1.36
seemed
1.36
wasn
1.35
amounted
1.34
Activations Density 1.396%