INDEX
Explanations
development and culture transitions
New Auto-Interp
Negative Logits
gebnis
0.47
show
0.46
спек
0.44
went
0.43
ng
0.42
сно
0.42
ocial
0.41
setShow
0.41
ör
0.41
ativ
0.41
POSITIVE LOGITS
azonban
0.57
entanto
0.55
however
0.52
HOWEVER
0.52
però
0.51
이지만
0.48
tačiau
0.48
interpr
0.46
όμως
0.45
namun
0.44
Activations Density 0.008%