INDEX
Explanations
proper nouns and more technical or specific terms
words related to specific measurements or statistical terms
New Auto-Interp
Head Attr Weights
0:0.12
1:0.02
2:0.39
3:0.05
4:0.06
5:0.05
6:0.04
7:0.02
8:0.04
9:0.06
10:0.07
11:0.03
Negative Logits
ctrl
-1.19
Magikarp
-1.19
ocument
-1.12
Recipe
-1.08
impart
-1.07
loud
-1.06
contra
-1.05
swer
-1.04
REDACTED
-1.04
fw
-1.04
POSITIVE LOGITS
heed
1.36
uden
1.35
imer
1.35
warm
1.30
itsch
1.30
MpServer
1.27
ascus
1.24
hart
1.22
emon
1.20
gren
1.18
Activations Density 0.032%