INDEX
Explanations
occurrences of specific keywords related to events or locations
New Auto-Interp
Negative Logits
Royal
-0.43
P
-0.41
Pro
-0.40
pen
-0.36
C
-0.36
B
-0.36
R
-0.36
M
-0.36
p
-0.36
di
-0.35
POSITIVE LOGITS
<unused23>
1.14
[@BOS@]
1.13
<unused17>
1.13
<unused42>
1.13
<unused43>
1.13
<pad>
1.13
<unused3>
1.13
<unused41>
1.13
<unused74>
1.13
<unused28>
1.13
Activations Density 0.266%