INDEX
Explanations
punctuation and sentence endings
New Auto-Interp
Negative Logits
يتيمه
-0.58
Oost
-0.57
virons
-0.55
гер
-0.55
Gemeinsame
-0.55
leſs
-0.52
rect
-0.52
MAD
-0.52
くれました
-0.52
fleisch
-0.52
POSITIVE LOGITS
*,
2.27
!,
2.20
?,
2.15
%,
2.12
(),
2.08
*,
2.07
/,
2.07
+,
2.04
°,
2.04
€,
2.04
Activations Density 0.228%