INDEX
Explanations
affirmation and auxiliary verbs
New Auto-Interp
Negative Logits
Unless
0.40
Essentially
0.36
Unless
0.36
절대
0.35
するのが
0.35
Dane
0.34
絶対
0.34
чної
0.33
๊ก
0.33
сло
0.33
POSITIVE LOGITS
确实
1.67
DOES
1.66
DID
1.57
does
1.55
確實
1.52
does
1.42
do
1.39
did
1.38
DO
1.32
memang
1.28
Activations Density 0.192%