INDEX
Explanations
finding extremes by comparison
New Auto-Interp
Negative Logits
лно
0.38
inversa
0.36
lau
0.36
solv
0.35
przes
0.35
सब
0.35
quanto
0.34
༘
0.34
row
0.34
partner
0.33
POSITIVE LOGITS
comparisons
0.80
Comparisons
0.75
Comparison
0.75
срав
0.70
сравнению
0.69
comparison
0.69
Compar
0.68
comparing
0.68
Comparison
0.68
сравни
0.68
Activations Density 0.057%