INDEX
Explanations
phrases indicating progress or development
New Auto-Interp
Negative Logits
ongoing
-0.17
avig
-0.16
isters
-0.15
roj
-0.15
uous
-0.14
edis
-0.14
Trap
-0.14
reetings
-0.14
idis
-0.14
enas
-0.14
POSITIVE LOGITS
lengths
0.21
extremes
0.19
tangent
0.19
motions
0.19
fishing
0.18
tilt
0.17
routes
0.17
route
0.17
broke
0.17
crazy
0.17
Activations Density 0.107%