INDEX
Explanations
phrases indicating direction or guidance
New Auto-Interp
Negative Logits
ÙĪØ§Ùĩ
-0.17
nds
-0.16
artz
-0.15
555
-0.14
IMIZE
-0.14
flip
-0.13
át
-0.13
озÑı
-0.13
tw
-0.13
ari
-0.13
POSITIVE LOGITS
track
0.37
track
0.32
_track
0.28
Track
0.27
Track
0.27
course
0.26
-track
0.26
.track
0.24
tracks
0.23
TRACK
0.22
Activations Density 0.049%