INDEX
Explanations
words related to past actions or completed tasks
phrases related to academic or formal communication
New Auto-Interp
Negative Logits
eatures
-0.58
ghai
-0.57
quartered
-0.53
Sundays
-0.52
edIn
-0.50
hospitality
-0.49
rely
-0.49
pilgr
-0.49
ofi
-0.48
Saturdays
-0.48
POSITIVE LOGITS
/,
0.75
(!
0.73
;
0.72
.
0.68
haha
0.67
lol
0.67
:)
0.66
!:
0.65
("0.63
etc
0.63
Activations Density 1.186%