INDEX
Explanations
phrases related to promoting progress or initiatives
New Auto-Interp
Negative Logits
laus
-0.14
Violence
-0.13
ilded
-0.13
ucks
-0.13
jo
-0.13
cats
-0.13
namen
-0.13
CID
-0.13
ners
-0.12
106
-0.12
POSITIVE LOGITS
incent
0.15
illes
0.15
ANCED
0.15
OwnProperty
0.15
advancement
0.14
INCLUDED
0.14
illary
0.14
taper
0.14
Aires
0.14
CELER
0.14
Activations Density 0.020%