INDEX
Explanations
words related to freshness or newness
New Auto-Interp
Negative Logits
si
-0.18
tion
-0.18
tx
-0.18
ean
-0.17
ys
-0.17
té
-0.17
sap
-0.17
hai
-0.17
e
-0.16
ta
-0.16
POSITIVE LOGITS
hest
0.31
coes
0.28
cos
0.26
her
0.24
coe
0.22
cob
0.22
cura
0.21
nel
0.20
co
0.19
coln
0.18
Activations Density 0.006%