INDEX
Explanations
adjectives related to impact or importance
descriptive adjectives indicating significant or noteworthy features
New Auto-Interp
Negative Logits
ãĤ±
-0.71
ruary
-0.70
uilding
-0.70
ternity
-0.68
ioxide
-0.68
agine
-0.68
aval
-0.68
ravel
-0.66
ordon
-0.65
ategory
-0.64
POSITIVE LOGITS
aspect
1.48
thing
1.37
aspects
1.17
part
1.16
feature
1.16
pecul
1.09
element
1.06
takeaway
1.03
characteristic
1.00
facet
1.00
Activations Density 0.141%