INDEX
Explanations
phrases indicating value or importance
phrases emphasizing the concept of "worth" or value
New Auto-Interp
Negative Logits
Hazel
-0.68
Sierra
-0.68
Jude
-0.66
Chamber
-0.66
Broad
-0.65
Dog
-0.65
Dogs
-0.64
Bus
-0.64
Frontier
-0.63
Cats
-0.63
POSITIVE LOGITS
worth
0.99
iness
0.97
lihood
0.91
worth
0.91
ily
0.84
trade
0.83
daq
0.83
orth
0.80
ilege
0.79
worthwhile
0.78
Activations Density 0.011%