INDEX
Explanations
adjectives describing the quality or desirability of items
adjectives that express quality or desirability
New Auto-Interp
Negative Logits
lees
-0.93
ometers
-0.88
steps
-0.88
ernels
-0.86
aucuses
-0.84
hops
-0.82
tests
-0.79
ships
-0.79
levels
-0.78
events
-0.78
POSITIVE LOGITS
piece
1.21
chunk
1.07
person
0.99
sized
0.98
paycheck
0.98
dose
0.96
meal
0.96
pair
0.94
slice
0.92
adversary
0.91
Activations Density 0.403%