INDEX
Explanations
references to academic journals and publications
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.07
3:0.05
4:0.05
5:0.03
6:0.38
7:0.16
8:0.03
9:0.05
10:0.04
11:0.07
Negative Logits
peat
-1.42
compliment
-1.34
keyes
-1.33
erest
-1.30
excuse
-1.29
orem
-1.25
worms
-1.25
seniors
-1.25
candles
-1.23
cane
-1.22
POSITIVE LOGITS
士
1.50
Pain
1.42
Catal
1.41
opio
1.40
Centauri
1.39
龍喚士
1.33
Wars
1.21
usc
1.20
los
1.20
将
1.20
Activations Density 0.006%