INDEX
Explanations
patterns or formats in the text that resemble examples or references
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.22
3:0.11
4:0.10
5:0.03
6:0.06
7:0.15
8:0.04
9:0.04
10:0.08
11:0.06
Negative Logits
predators
-1.64
bodies
-1.64
disciplines
-1.64
distractions
-1.63
ading
-1.61
instruments
-1.60
emergencies
-1.51
detectors
-1.43
isi
-1.43
sacrific
-1.42
POSITIVE LOGITS
MpServer
1.77
reverse
1.64
WARD
1.64
terday
1.60
ALLY
1.55
rary
1.55
redd
1.48
ーク
1.48
VERT
1.48
VAL
1.46
Activations Density 0.000%