INDEX
Explanations
German instructions and phrasing
New Auto-Interp
Negative Logits
Duits
0.62
DAS
0.60
DAS
0.56
Dtsch
0.55
Das
0.54
német
0.54
daß
0.52
šport
0.52
よそ
0.52
äsident
0.52
POSITIVE LOGITS
onboard
0.46
методом
0.43
iteratively
0.42
iterative
0.42
bere
0.40
Beans
0.40
Contains
0.39
Bel
0.39
adapt
0.39
bel
0.39
Activations Density 0.040%