INDEX
Explanations
phrases indicating conditions or reasons, particularly those involving anonymity or justification
New Auto-Interp
Head Attr Weights
0:0.09
1:0.05
2:0.03
3:0.19
4:0.05
5:0.05
6:0.05
7:0.02
8:0.30
9:0.08
10:0.02
11:0.02
Negative Logits
Subject
-1.66
NOW
-1.59
disappoint
-1.55
ラン
-1.53
embed
-1.49
terr
-1.45
dyn
-1.44
ById
-1.44
Straw
-1.42
precon
-1.40
POSITIVE LOGITS
Leilan
1.70
yip
1.61
ferry
1.58
fellowship
1.45
���
1.44
Chinatown
1.40
poke
1.40
ilities
1.39
Crossing
1.39
iltration
1.37
Activations Density 0.001%