INDEX
Explanations
comparative phrases indicating a greater or lesser degree of something
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.11
3:0.04
4:0.04
5:0.02
6:0.09
7:0.39
8:0.03
9:0.05
10:0.06
11:0.09
Negative Logits
bsite
-1.88
reconciliation
-1.70
onut
-1.56
rame
-1.53
acebook
-1.53
balance
-1.44
imore
-1.44
reflection
-1.43
Recon
-1.42
ocene
-1.38
POSITIVE LOGITS
Enough
1.50
ACTIONS
1.46
Goth
1.45
��
1.39
scorn
1.37
aneers
1.35
Merit
1.35
shown
1.33
KK
1.31
MQ
1.31
Activations Density 0.001%