INDEX
Explanations
modal verbs and phrases indicating conditions or choices
New Auto-Interp
Head Attr Weights
0:0.11
1:0.05
2:0.01
3:0.17
4:0.10
5:0.11
6:0.04
7:0.02
8:0.11
9:0.20
10:0.01
11:0.02
Negative Logits
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-1.93
��
-1.87
soon
-1.87
someday
-1.82
uries
-1.72
================
-1.71
�
-1.70
Quantity
-1.63
ById
-1.62
Badge
-1.62
POSITIVE LOGITS
ammad
2.10
remotely
1.85
disclaim
1.74
dispar
1.74
honoured
1.72
tains
1.71
cluded
1.69
gender
1.68
honored
1.68
referenced
1.68
Activations Density 0.001%