INDEX
Explanations
comparisons and contrasts in opinions or descriptions
New Auto-Interp
Negative Logits
vulgar
-0.15
rude
-0.14
theid
-0.14
bold
-0.14
adrenaline
-0.14
agger
-0.14
çĭĤ
-0.14
lively
-0.14
Independence
-0.14
ût
-0.14
POSITIVE LOGITS
gent
0.45
gentle
0.44
soft
0.42
soft
0.38
gent
0.38
softer
0.36
Soft
0.36
Soft
0.35
peaceful
0.34
quiet
0.32
Activations Density 0.578%