INDEX
Explanations
instances of the word "which."
New Auto-Interp
Negative Logits
cy
-0.75
o
-0.67
ys
-0.66
ته
-0.66
Baton
-0.64
ded
-0.64
ことはない
-0.63
Hov
-0.62
ed
-0.62
e
-0.62
POSITIVE LOGITS
WHICH
1.47
which
1.41
Which
1.37
which
1.35
Which
1.27
Datuak
1.26
wich
1.14
hich
1.06
ซึ่ง
1.06
]**
1.04
Activations Density 0.166%