INDEX
Explanations
phrases indicating change or transition
New Auto-Interp
Negative Logits
faf
-0.15
_cpp
-0.14
ิศ
-0.14
Ù쨧ÙĤ
-0.14
issan
-0.14
cxx
-0.14
arkin
-0.13
enheim
-0.13
elle
-0.13
istar
-0.13
POSITIVE LOGITS
follow
1.23
follow
1.17
Follow
1.16
follows
1.13
followed
1.09
Follow
1.09
FOLLOW
0.99
-follow
0.97
_follow
0.93
.follow
0.92
Activations Density 0.293%