INDEX
Explanations
same concept across contexts
New Auto-Interp
Negative Logits
ciendo
0.45
attup
0.44
COD
0.42
Experts
0.42
جهات
0.42
праців
0.42
শ্য
0.41
的具体
0.41
ందని
0.40
professionals
0.40
POSITIVE LOGITS
same
0.56
Same
0.52
same
0.51
Same
0.48
misma
0.45
selben
0.44
mismo
0.44
같은
0.44
Ibid
0.43
SAME
0.42
Activations Density 0.000%