INDEX
Explanations
theory and practical knowledge
New Auto-Interp
Negative Logits
sciatica
0.75
᱕
0.75
теп
0.73
ных
0.73
тации
0.71
ва
0.70
качество
0.69
впечатление
0.67
dục
0.67
влияние
0.66
POSITIVE LOGITS
Element
0.96
et
0.95
2
0.93
Theory
0.91
us
0.89
א
0.88
Theoretical
0.87
తో
0.86
y
0.85
w
0.85
Activations Density 0.018%