INDEX
Explanations
references to historical demographics and categories of people or places
births and nationalities
New Auto-Interp
Negative Logits
<eos>
-0.59
↵↵
-0.57
-0.53
-0.50
[…]
-0.49
-0.47
-0.47
<strong>
-0.47
-0.45
-0.42
POSITIVE LOGITS
queſta
1.23
ſchaft
1.10
<unused52>
1.09
<unused8>
1.09
<unused41>
1.09
<unused23>
1.08
<unused28>
1.08
<unused16>
1.08
[@BOS@]
1.08
<unused14>
1.08
Activations Density 0.020%