INDEX
Negative Logits
ณฑ์
0.39
Ac
0.37
に示す
0.37
Rav
0.35
বাহিত
0.35
क्लियर
0.34
שׁ
0.34
別
0.34
ted
0.33
दिखाना
0.33
POSITIVE LOGITS
weird
1.16
odd
1.05
strange
1.05
奇怪
1.00
weird
0.98
Weird
0.95
oddly
0.93
bizarre
0.93
quirks
0.93
quirk
0.90
Activations Density 0.018%