INDEX
Explanations
words related to the word "don't."
New Auto-Interp
Head Attr Weights
0:0.08
1:0.05
2:0.09
3:0.07
4:0.08
5:0.09
6:0.08
7:0.08
8:0.07
9:0.08
10:0.10
11:0.10
Negative Logits
surv
-1.79
rewrite
-1.63
サーティワン
-1.61
survives
-1.59
metics
-1.59
CoC
-1.58
ertation
-1.48
ividual
-1.45
hyde
-1.44
discipline
-1.44
POSITIVE LOGITS
Spit
1.81
Friend
1.72
inav
1.64
DK
1.63
abroad
1.61
nings
1.57
Ambassador
1.55
INTON
1.55
gin
1.54
reb
1.52
Activations Density 0.000%