INDEX
Explanations
the word "that" in various forms and contexts
New Auto-Interp
Negative Logits
PHA
-0.60
mer
-0.57
Royce
-0.56
vif
-0.53
Magi
-0.52
MV
-0.52
Trimble
-0.51
tanong
-0.50
塁
-0.50
vements
-0.49
POSITIVE LOGITS
THAT
1.69
That
1.62
That
1.60
THAT
1.56
that
1.55
that
1.34
ese
1.06
thats
1.05
Thats
1.01
this
0.99
Activations Density 0.572%