INDEX
Explanations
phrases indicating desires or intentions
instances of the word "to"
New Auto-Interp
Negative Logits
Compass
-0.82
Enhancement
-0.73
metadata
-0.67
grounds
-0.67
Independence
-0.64
Kinnikuman
-0.64
Millennium
-0.64
ements
-0.61
wikipedia
-0.60
Measures
-0.59
POSITIVE LOGITS
spoil
1.19
bother
1.14
offend
1.14
admit
1.10
waste
1.09
spend
1.08
hear
1.07
lose
1.03
burden
1.01
risk
1.00
Activations Density 0.087%