INDEX
Explanations
words indicating completion or finality
New Auto-Interp
Negative Logits
lvl
-0.58
ãĥ«
-0.57
obal
-0.54
aux
-0.54
alky
-0.53
hops
-0.52
ols
-0.51
urs
-0.50
ãĥİ
-0.49
KING
-0.49
POSITIVE LOGITS
.[
1.00
!.
0.95
.—
0.93
;
0.93
,—
0.92
!
0.91
!,
0.90
.(
0.90
.
0.88
;)
0.88
Activations Density 0.641%