INDEX
Explanations
phrases expressing a final decision or judgment
common conjunctions and prepositions that indicate relationships between ideas
New Auto-Interp
Negative Logits
ãĤ¶
-0.64
Ĥª
-0.61
obal
-0.60
hler
-0.55
KING
-0.55
ãĥ¼ãĥ
-0.54
teness
-0.53
argon
-0.51
ãĥ«
-0.50
Oath
-0.50
POSITIVE LOGITS
;
1.02
.
1.01
during
1.00
.</
0.99
.—
0.98
!
0.98
.[
0.97
!,
0.95
lest
0.95
;)
0.94
Activations Density 1.069%