INDEX
Explanations
pronouns followed by positive or admiring sentiments
instances of the word "It" highlighting the importance of certain statements or observations
New Auto-Interp
Negative Logits
Eighth
-0.68
warning
-0.61
itatively
-0.60
anton
-0.60
è£ıè¦ļéĨĴ
-0.57
arthed
-0.57
NX
-0.56
Mole
-0.56
Dayton
-0.56
Mouth
-0.55
POSITIVE LOGITS
ain
1.15
chy
1.11
hurts
1.05
zbollah
1.05
wasn
1.04
seems
1.02
happened
1.02
happens
1.00
depends
0.97
boils
0.96
Activations Density 0.266%