INDEX
Explanations
descriptive phrases, categorizing concepts
New Auto-Interp
Negative Logits
见
0.38
摘要
0.36
見
0.36
เห็น
0.36
الظ
0.36
得分
0.35
saw
0.35
வரவே
0.34
SOURCES
0.34
thấy
0.34
POSITIVE LOGITS
correct
0.43
ျေး
0.42
piss
0.42
нен
0.41
lis
0.41
protects
0.40
anis
0.39
ether
0.39
filing
0.38
OLS
0.38
Activations Density 0.000%