INDEX
Explanations
descriptions of people or characters along with a particular physical attribute
phrases that describe people or objects accompanied by specific attributes or actions
New Auto-Interp
Negative Logits
Merit
-0.80
Provided
-0.73
estate
-0.70
ICO
-0.70
FW
-0.70
Poll
-0.67
iversal
-0.67
hereafter
-0.66
hower
-0.64
here
-0.64
POSITIVE LOGITS
stood
1.46
drawn
1.27
draw
1.08
clenched
1.02
sunglasses
1.01
goggles
1.00
headphones
0.99
electrodes
0.98
scissors
0.97
horns
0.96
Activations Density 0.148%