INDEX
Negative Logits
sentences
0.76
sentence
0.75
sentence
0.73
Sentence
0.72
シンプルな
0.71
words
0.69
Worte
0.69
Words
0.68
kalimat
0.68
illusory
0.67
POSITIVE LOGITS
suspects
0.84
saying
0.82
wisdom
0.82
Wisdom
0.80
complaining
0.79
jokes
0.78
Saying
0.75
Conventional
0.74
joked
0.73
saws
0.73
Activations Density 0.022%