INDEX
Explanations
place of refuge or special interest
New Auto-Interp
Negative Logits
0.58
rn
0.58
grandes
0.54
italiano
0.53
Г
0.51
হেলিকপ্ট
0.51
idi
0.50
printemps
0.50
desliz
0.50
кури
0.50
POSITIVE LOGITS
f
0.96
ве
0.71
ла
0.68
,
0.64
да
0.57
त्र
0.57
↵
0.57
ба
0.57
००
0.57
c
0.57
Activations Density 0.001%