INDEX
Explanations
rum leading to rumors or names
New Auto-Interp
Negative Logits
clueless
0.39
}}=\
0.38
}-[
0.38
tiende
0.37
independ
0.36
testimonial
0.36
distribute
0.35
ollar
0.35
isActive
0.35
kikh
0.34
POSITIVE LOGITS
ุณ
0.41
boli
0.41
rum
0.41
ુક
0.40
बे
0.40
deprived
0.40
depriving
0.39
eping
0.38
deprive
0.38
checks
0.38
Activations Density 0.004%