INDEX
Explanations
node or * followed by punctuation/special characters
New Auto-Interp
Negative Logits
–
0.89
ل
0.79
To
0.77
पी
0.77
ي
0.75
AD
0.74
’
0.71
ر
0.71
기
0.70
ari
0.69
POSITIVE LOGITS
catcher
0.98
shells
0.95
cheques
0.94
pessoais
0.93
borrar
0.93
rashes
0.91
surfers
0.88
вался
0.87
перы
0.87
surfer
0.86
Activations Density 0.002%