INDEX
Explanations
expressions of hopefulness and engagement with the reader
New Auto-Interp
Negative Logits
ought
-0.16
should
-0.16
shouldBe
-0.15
SHOULD
-0.15
skulle
-0.14
.should
-0.14
ÏĢι
-0.14
apper
-0.14
trebuie
-0.14
Should
-0.13
POSITIVE LOGITS
enjoyed
0.22
guys
0.22
enjoy
0.20
enjoys
0.19
Enjoy
0.18
Enjoy
0.18
Guys
0.18
enjoying
0.17
agree
0.17
found
0.15
Activations Density 0.048%