INDEX
Explanations
adjectives describing negative attributes or actions
negative connotations and associations related to various topics
New Auto-Interp
Negative Logits
theless
-0.81
terday
-0.69
Invalid
-0.65
Shining
-0.65
individually
-0.64
Awakening
-0.64
Enhanced
-0.63
lished
-0.63
dated
-0.63
enriched
-0.62
POSITIVE LOGITS
ocations
1.02
aution
0.99
ptions
0.93
ours
0.92
ippers
0.92
notations
0.90
angs
0.90
tones
0.89
oles
0.88
urances
0.88
Activations Density 0.310%