INDEX
Explanations
references to the word "what" in various contexts
what followed by pronouns or verbs
New Auto-Interp
Negative Logits
Verge
-0.56
☆☆
-0.55
AMC
-0.54
mez
-0.54
مشين
-0.52
str
-0.52
liez
-0.52
erl
-0.50
P
-0.50
oucí
-0.50
POSITIVE LOGITS
WHAT
1.04
WHAT
0.98
what
0.98
What
0.95
what
0.91
What
0.88
ArgsConstructor
0.79
happened
0.78
happens
0.77
<=",
0.74
Activations Density 0.127%