INDEX
Explanations
references to the word "who"
New Auto-Interp
Negative Logits
ซ์
-0.61
transporta
-0.60
lans
-0.59
Erle
-0.58
しかない
-0.57
a
-0.56
nica
-0.56
を与える
-0.56
ן
-0.56
**/
-0.55
POSITIVE LOGITS
who
2.10
Who
2.10
Who
2.02
who
1.94
WHO
1.80
WHO
1.80
hvem
1.59
whom
1.58
quién
1.56
quién
1.55
Activations Density 0.039%