INDEX
Explanations
square brackets, mathematical symbols, and diagram labels
New Auto-Interp
Negative Logits
ainfi
-1.24
myſelf
-1.17
enfans
-1.17
Monfieur
-1.17
himſelf
-1.16
étoient
-1.14
ſche
-1.14
poffible
-1.12
feroit
-1.12
cauſe
-1.12
POSITIVE LOGITS
pos
0.67
'
0.63
di
0.59
Me
0.59
Her
0.58
gran
0.57
mid
0.57
la
0.55
el
0.55
p
0.54
Activations Density 0.490%