INDEX
Explanations
phrases indicating changes or conversions
New Auto-Interp
Negative Logits
"}")
-0.70
hésite
-0.64
Kenne
-0.63
Dade
-0.57
peper
-0.56
comigo
-0.55
testnet
-0.55
ednesdays
-0.55
Diwedd
-0.54
معلومات
-0.54
POSITIVE LOGITS
值为
0.71
become
0.71
a
0.68
Become
0.67
deviennent
0.65
diventare
0.65
改为
0.62
变成
0.62
zerw
0.62
becoming
0.62
Activations Density 0.422%