INDEX
Explanations
sentences indicating possession or ownership
expressions of hope and positivity
New Auto-Interp
Negative Logits
hops
-0.57
lvl
-0.53
ãĥ«
-0.51
KING
-0.50
ean
-0.50
hare
-0.50
obal
-0.49
urs
-0.49
hig
-0.49
aux
-0.49
POSITIVE LOGITS
.—
0.92
!,
0.89
.[
0.81
;
0.81
!
0.80
.ãĢį
0.79
,—
0.79
.
0.79
!.
0.78
.(
0.77
Activations Density 0.969%