INDEX
Explanations
adjectives related to behaviors or traits
New Auto-Interp
Negative Logits
yourselves
-0.68
ulla
-0.65
oneself
-0.64
zens
-0.61
common
-0.60
CPC
-0.60
xious
-0.58
Barg
-0.58
ourselves
-0.58
GC
-0.58
POSITIVE LOGITS
notwithstanding
0.95
shines
0.90
shone
0.88
coincide
0.86
contrasted
0.85
coupled
0.84
proved
0.81
belie
0.81
consisted
0.80
inspires
0.79
Activations Density 0.279%