INDEX
Explanations
punctuation marks, specifically periods
New Auto-Interp
Negative Logits
lenker
-0.69
فريبيس
-0.67
betweenstory
-0.59
<>",
-0.59
invokingState
-0.54
صوتيه
-0.53
poveznice
-0.51
ãng
-0.50
Karo
-0.50
ftagPool
-0.50
POSITIVE LOGITS
.
1.04
(.
0.91
'.
0.91
('.0.90
//.
0.80
;->
0.76
".
0.76
','.
0.75
.$.
0.75
<.
0.73
Activations Density 0.032%