INDEX
Explanations
proper nouns and names of places or entities
New Auto-Interp
Negative Logits
"
-0.46
-0.44
all
-0.43
“
-0.42
-0.40
and
-0.40
or
-0.40
(
-0.39
not
-0.39
in
-0.38
POSITIVE LOGITS
المعيارى
1.02
ロウィン
0.91
betweenstory
0.90
שוליים
0.84
WriteTagHelper
0.84
ſelben
0.83
ſammen
0.82
[@BOS@]
0.79
<unused14>
0.79
<unused28>
0.79
Activations Density 1.047%