INDEX
Explanations
positive descriptive words and expressions
adjectives and descriptive phrases expressing positivity or critique
New Auto-Interp
Negative Logits
ãĥĺãĥ©
-0.70
anges
-0.70
adj
-0.69
opy
-0.64
_-
-0.63
ould
-0.62
=-=-=-=-=-=-=-=-
-0.61
OULD
-0.61
Domain
-0.61
annot
-0.60
POSITIVE LOGITS
lately
1.09
fruitful
1.00
since
0.99
unsuccessful
0.81
successful
0.79
awhile
0.78
productive
0.75
steady
0.74
steadily
0.72
since
0.72
Activations Density 0.310%