INDEX
Explanations
text within brackets
square brackets
New Auto-Interp
Negative Logits
wagen
-0.82
occas
-0.77
pens
-0.75
nesday
-0.73
edIn
-0.73
emouth
-0.70
thous
-0.69
mable
-0.68
rooft
-0.68
Orn
-0.68
POSITIVE LOGITS
?]
1.32
edit
1.21
ËĪ
1.19
!]
1.15
sic
1.15
actionDate
1.09
Footnote
1.06
emphasis
1.03
laughs
1.02
%]
1.00
Activations Density 0.040%