INDEX
Explanations
instances where there is a willingness to do something
expressions of willingness or unwillingness to take action
New Auto-Interp
Negative Logits
icle
-0.87
icles
-0.76
inas
-0.74
agraph
-0.70
adish
-0.70
Regions
-0.69
Sections
-0.67
Panther
-0.66
Anthem
-0.65
NCT
-0.64
POSITIVE LOGITS
willingness
1.05
unwillingness
0.93
yip
0.91
guiActiveUn
0.90
attitude
0.88
ï¸
0.86
terday
0.78
willingly
0.76
stance
0.75
reluctance
0.74
Activations Density 0.006%