INDEX
Explanations
words related to positive emotions or characteristics
adjectives that describe various qualities or characteristics
New Auto-Interp
Negative Logits
ajor
-0.85
Downloadha
-0.71
strengthened
-0.67
inaug
-0.67
corresponding
-0.67
authorized
-0.65
quart
-0.63
supported
-0.63
conservancy
-0.63
onto
-0.63
POSITIVE LOGITS
ness
1.25
ly
1.13
Enough
1.08
nesses
1.07
NESS
1.00
est
0.97
enough
0.95
LY
0.87
Bastard
0.83
glers
0.82
Activations Density 0.218%