INDEX
Explanations
numeric values and references to key entities or concepts in the text
New Auto-Interp
Negative Logits
W
-0.66
E
-0.60
S
-0.59
M
-0.56
C
-0.54
I
-0.54
O
-0.54
B
-0.53
K
-0.51
D
-0.50
POSITIVE LOGITS
itſelf
1.15
iſt
1.14
Efq
1.14
leſs
1.13
Theſe
1.13
ſever
1.12
Anſ
1.12
ſelves
1.11
ſeveral
1.11
Reſ
1.10
Activations Density 1.914%