INDEX
Explanations
indicating error or deviation
New Auto-Interp
Negative Logits
ت
1.26
a
1.17
т
0.99
esque
0.97
detailed
0.87
yek
0.79
e
0.79
site
0.76
detailed
0.76
Detailed
0.73
POSITIVE LOGITS
givings
1.08
Fortune
1.06
lıkla
1.03
anlaş
1.02
genutzt
0.99
lections
0.98
käs
0.97
isch
0.96
uable
0.96
omyelitis
0.95
Activations Density 0.048%