INDEX
Explanations
syntax-related tokens, particularly statement terminators and parentheses
New Auto-Interp
Negative Logits
ś
-0.54
khid
-0.53
åd
-0.53
h
-0.52
stå
-0.51
garde
-0.51
ad
-0.50
sûr
-0.50
league
-0.50
hers
-0.50
POSITIVE LOGITS
])));
2.02
());
2.01
]));
1.96
));
1.94
);
1.93
)));
1.92
)');
1.92
);
1.88
')));
1.86
)]);
1.85
Activations Density 0.118%