INDEX
Explanations
still under development or learning
New Auto-Interp
Negative Logits
:
1.35
i
1.13
ي
1.05
,"
1.02
,”
1.01
effic
0.98
",
0.97
explain
0.92
autre
0.91
,'
0.91
POSITIVE LOGITS
ت
1.12
л
1.06
h
0.99
м
0.97
舩
0.93
चांगले
0.92
لد
0.91
ام
0.90
م
0.89
در
0.88
Activations Density 0.444%