INDEX
Explanations
personal autobiography and responsibility
New Auto-Interp
Negative Logits
ت
1.52
ك
1.50
خ
1.36
一
1.30
ول
1.20
يا
1.20
ಮ
1.16
ла
1.15
አ
1.13
ల
1.11
POSITIVE LOGITS
↵
1.11
ни
0.96
us
0.94
é
0.90
personal
0.90
่
0.86
0.86
ur
0.85
pribadi
0.85
is
0.82
Activations Density 0.060%