INDEX
Explanations
punctuation marks and contextually relevant phrases that indicate a narrative or conversational flow
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.08
3:0.05
4:0.25
5:0.03
6:0.19
7:0.07
8:0.04
9:0.03
10:0.09
11:0.08
Negative Logits
ナ
-1.41
ネ
-1.29
dating
-1.27
Sever
-1.25
ヴァ
-1.23
igham
-1.19
natal
-1.18
urance
-1.18
Cosmetic
-1.16
surfaces
-1.15
POSITIVE LOGITS
rounder
1.39
indo
1.38
anwhile
1.36
jew
1.35
ophon
1.34
bart
1.32
surv
1.31
du
1.30
starve
1.29
ribly
1.29
Activations Density 0.001%