INDEX
Explanations
expressions of significant challenges or problems faced
New Auto-Interp
Negative Logits
var
-0.27
vara
-0.24
verschillen
-0.24
differently
-0.24
Lem
-0.23
niem
-0.22
courants
-0.21
officiels
-0.21
lessened
-0.21
h
-0.21
POSITIVE LOGITS
Geiſt
0.77
Waſſer
0.75
zwiſchen
0.73
ſeyn
0.72
<unused7>
0.71
<pad>
0.71
<unused20>
0.71
Menſchen
0.71
<unused41>
0.71
<unused43>
0.71
Activations Density 1.778%