INDEX
Explanations
best, worst, cheapest, simplest
New Auto-Interp
Negative Logits
most
1.85
наиболее
1.64
найбільш
1.57
most
1.49
MOST
1.47
Most
1.43
Most
1.39
MOST
1.39
最も
1.36
paling
1.35
POSITIVE LOGITS
terd
0.83
abl
0.79
Nic
0.77
Tou
0.74
coars
0.73
spars
0.72
Nic
0.72
ますが
0.72
sadd
0.71
ear
0.70
Activations Density 0.210%