INDEX
Explanations
the word "features"
features
New Auto-Interp
Negative Logits
<<<<<<<<<<<<<<
-0.97
rempliss
-0.96
refroid
-0.94
berdayakan
-0.86
bibinfo
-0.84
conservé
-0.84
refusé
-0.81
supprim
-0.81
tvguidetime
-0.79
fallu
-0.79
POSITIVE LOGITS
ce
0.65
وار
0.54
ст
0.54
Talis
0.52
Read
0.52
0.51
cles
0.51
exhaustive
0.51
Clik
0.51
0.50
Activations Density 2.707%