INDEX
Explanations
phrases that indicate the concept of beginnings or starts
New Auto-Interp
Negative Logits
ola
-0.17
ede
-0.16
URA
-0.16
alie
-0.16
енÑĤÑĥ
-0.15
ed
-0.15
gren
-0.14
gmt
-0.14
ulence
-0.14
enton
-0.14
POSITIVE LOGITS
nings
0.38
/end
0.32
-middle
0.28
ning
0.24
stages
0.24
NING
0.22
-stage
0.21
steps
0.20
/Foundation
0.20
,end
0.17
Activations Density 0.035%