INDEX
Explanations
phrases expressing reluctance or unwillingness
phrases expressing desires or intentions
New Auto-Interp
Negative Logits
Compass
-0.80
grounds
-0.78
hooting
-0.75
metadata
-0.67
Needs
-0.64
Kinnikuman
-0.63
herer
-0.62
Kings
-0.61
agonists
-0.61
Returning
-0.60
POSITIVE LOGITS
anymore
1.03
bother
1.01
spoil
0.92
offend
0.92
celebrate
0.92
capitalize
0.89
interfere
0.88
compete
0.88
participate
0.87
hear
0.86
Activations Density 0.081%