INDEX
Explanations
distinctive adjectives and comparative phrases
New Auto-Interp
Negative Logits
elin
-0.15
fflush
-0.14
.ax
-0.14
Shut
-0.14
li
-0.14
ven
-0.14
lic
-0.14
DonaldTrump
-0.13
bank
-0.13
frank
-0.13
POSITIVE LOGITS
ones
0.17
è¾ĥ
0.16
ãĥ³ãĥĶ
0.16
ICLE
0.16
než
0.16
-than
0.15
Ones
0.15
ones
0.15
portions
0.15
ksam
0.14
Activations Density 0.109%