INDEX
Explanations
the word "who" in various contexts
New Auto-Interp
Negative Logits
らう
-0.73
Eis
-0.71
Steck
-0.70
alis
-0.64
Merk
-0.64
Wellen
-0.62
έκ
-0.62
ます
-0.61
Augustin
-0.61
Veer
-0.59
POSITIVE LOGITS
who
2.00
who
1.66
WHO
1.55
Who
1.53
Who
1.47
WHO
1.40
którzy
1.35
whom
1.34
whom
1.25
quien
1.24
Activations Density 0.101%