INDEX
Explanations
adjectives that express strong or negative qualities
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.06
3:0.06
4:0.05
5:0.04
6:0.41
7:0.11
8:0.04
9:0.04
10:0.05
11:0.05
Negative Logits
DAY
-1.36
Centauri
-1.23
neighb
-1.21
ゴン
-1.21
Dinosaur
-1.19
Za
-1.18
Pool
-1.10
Jeanne
-1.10
accuser
-1.10
Xia
-1.10
POSITIVE LOGITS
(>
1.57
arnaev
1.51
ersive
1.45
cientious
1.40
anto
1.39
rez
1.37
ileged
1.35
esta
1.35
agric
1.34
tarian
1.31
Activations Density 0.012%