INDEX
Explanations
phrases related to religious texts and their authenticity
New Auto-Interp
Negative Logits
644
-0.17
оÑĪ
-0.13
ilo
-0.13
dropout
-0.13
ROME
-0.13
-0.12
683
-0.12
vice
-0.12
distributed
-0.12
Channels
-0.12
POSITIVE LOGITS
pie
0.19
uese
0.19
pie
0.15
Pie
0.14
ovsky
0.14
Pie
0.14
_CRITICAL
0.14
Yüz
0.14
eneric
0.14
inz
0.14
Activations Density 0.017%