INDEX
Negative Logits
arbitrary
0.47
fascinated
0.47
વ
0.46
の説明
0.46
disagreement
0.46
ANS
0.46
and
0.45
UTS
0.45
user
0.45
reactivity
0.44
POSITIVE LOGITS
spesies
0.44
반려동
0.43
يد
0.43
뚠
0.42
ron
0.40
рио
0.40
పి
0.40
热
0.40
隂
0.40
рон
0.39
Activations Density 0.001%