INDEX
Explanations
bracketed text or placeholders
New Auto-Interp
Negative Logits
ية
0.78
не
0.76
(.)
0.68
咡
0.61
지
0.58
ى
0.57
虍
0.57
孽
0.56
唁
0.56
蚜
0.56
POSITIVE LOGITS
!]
1.19
?]
1.18
ל
1.09
,]
1.03
י
0.89
+]
0.86
()]
0.86
ب
0.86
]
0.85
ED
0.84
Activations Density 0.190%