INDEX
Explanations
questions asking how something works or how to accomplish a task
New Auto-Interp
Negative Logits
cu
-0.56
-0.53
[
-0.53
жен
-0.52
ci
-0.49
te
-0.49
[
-0.48
!
-0.47
bari
-0.47
fournir
-0.46
POSITIVE LOGITS
how
1.30
itſelf
1.28
myſelf
1.17
कैसे
1.17
Nasıl
1.17
Hvordan
1.15
איך
1.14
Hvordan
1.13
hvordan
1.12
چگونه
1.10
Activations Density 0.326%