INDEX
Explanations
phrases involving evaluation or comparison of concepts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.10
3:0.06
4:0.10
5:0.04
6:0.13
7:0.24
8:0.03
9:0.05
10:0.06
11:0.10
Negative Logits
eatures
-1.62
Technology
-1.49
Appearance
-1.44
apeake
-1.44
Materials
-1.38
ptr
-1.36
Depth
-1.33
Reilly
-1.31
Sea
-1.29
Ba
-1.26
POSITIVE LOGITS
prescribing
1.55
authorizing
1.41
veto
1.41
restraining
1.38
lah
1.33
excuses
1.29
finer
1.29
levers
1.29
blinking
1.29
ringing
1.29
Activations Density 0.002%