INDEX
Explanations
conjunctions followed by explanations
New Auto-Interp
Negative Logits
l
0.47
máme
0.46
tenemos
0.42
kita
0.41
’
0.41
n
0.41
ERO
0.40
ิ
0.40
ﺔ
0.40
máte
0.40
POSITIVE LOGITS
thereby
0.41
inaccur
0.40
captivated
0.40
점에서
0.40
inextricably
0.40
troubled
0.39
narrowly
0.39
prompting
0.39
draped
0.39
noticeably
0.38
Activations Density 0.942%