INDEX
Explanations
instances of the word "it," particularly in contexts indicating experiences or sentiments
New Auto-Interp
Negative Logits
ustr
-0.16
stu
-0.15
uum
-0.15
upp
-0.14
uzu
-0.14
881
-0.14
stoff
-0.13
IVO
-0.13
gilt
-0.13
reasonable
-0.13
POSITIVE LOGITS
felt
0.23
feels
0.21
feel
0.20
truly
0.19
anagan
0.17
feel
0.16
Feel
0.16
felt
0.16
Feel
0.15
honestly
0.15
Activations Density 0.078%