INDEX
Explanations
punctuation and common words
New Auto-Interp
Negative Logits
,
1.11
.
0.68
ר
0.62
ق
0.61
ان
0.59
ー
0.59
í
0.57
ل
0.55
cular
0.54
er
0.53
POSITIVE LOGITS
valamint
0.82
odnosno
0.77
illetve
0.76
जबकि
0.74
czyli
0.73
który
0.68
hanno
0.67
sogenannte
0.66
takže
0.64
اتارنا
0.63
Activations Density 0.250%