INDEX
Explanations
place names that are substrings
New Auto-Interp
Negative Logits
א
0.54
на
0.47
trashItem
0.46
וא
0.45
on
0.43
ج
0.43
ik
0.41
نا
0.41
墓志
0.40
ח
0.40
POSITIVE LOGITS
'
0.55
有
0.43
是
0.40
to
0.40
ę
0.39
或
0.39
soothing
0.38
我
0.38
其
0.37
毒
0.36
Activations Density 0.052%