INDEX
Explanations
references to rewards and incentives for engagement
New Auto-Interp
Negative Logits
nearly
-0.24
almost
-0.22
Nearly
-0.17
Nearly
-0.17
almost
-0.16
Almost
-0.16
anker
-0.14
Almost
-0.14
thá»ĥ
-0.14
ç´Ħ
-0.14
POSITIVE LOGITS
100
0.40
500
0.38
10
0.27
250
0.27
Hundred
0.26
50
0.26
hundred
0.25
ten
0.24
999
0.23
thousand
0.22
Activations Density 0.101%