INDEX
Explanations
phrases or words denoting value or worthiness
references to the concept of worthiness
New Auto-Interp
Negative Logits
gdala
-0.69
eteria
-0.69
esville
-0.67
Zig
-0.66
ole
-0.66
ulated
-0.65
Rove
-0.65
Boo
-0.64
Frazier
-0.63
iq
-0.62
POSITIVE LOGITS
minded
0.85
deserving
0.83
successor
0.82
worthy
0.80
successors
0.79
consideration
0.79
contenders
0.78
lihood
0.75
aspirations
0.72
indignation
0.72
Activations Density 0.017%