INDEX
Explanations
occurrences of significant events or actions, particularly those involving changes or decisions in various contexts
New Auto-Interp
Negative Logits
tend
-0.16
tended
-0.15
yal
-0.15
iated
-0.14
/
-0.14
.
-0.13
Tells
-0.13
deserve
-0.13
contain
-0.13
consec
-0.13
POSITIVE LOGITS
follows
0.36
follow
0.26
Follow
0.24
marks
0.23
Follow
0.23
follow
0.22
coinc
0.22
comes
0.22
.follow
0.21
FOLLOW
0.21
Activations Density 0.158%