INDEX
Explanations
words and phrases indicating positive characteristics or aspects
descriptive adjectives that highlight positive attributes and significance
New Auto-Interp
Negative Logits
ĸļ
-0.97
anwhile
-0.93
externalActionCode
-0.82
ensibly
-0.81
ordon
-0.81
ully
-0.78
©¶æ
-0.78
ldom
-0.77
anship
-0.77
deen
-0.76
POSITIVE LOGITS
examples
1.06
contenders
1.02
moments
0.97
things
0.96
names
0.93
surprises
0.91
instances
0.91
items
0.91
paragraphs
0.90
ideas
0.89
Activations Density 0.330%