INDEX
Explanations
phrases related to summaries or overall assessments
New Auto-Interp
Head Attr Weights
0:0.04
1:0.02
2:0.29
3:0.07
4:0.14
5:0.04
6:0.05
7:0.03
8:0.12
9:0.04
10:0.05
11:0.06
Negative Logits
ヘ
-1.65
rank
-1.58
score
-1.57
ahime
-1.52
drawn
-1.51
gow
-1.50
Luck
-1.46
Compare
-1.44
execute
-1.42
third
-1.42
POSITIVE LOGITS
ensable
1.89
igmat
1.69
ookie
1.52
ensional
1.50
appings
1.50
geries
1.47
hetical
1.45
earances
1.43
aques
1.41
ody
1.39
Activations Density 0.001%