INDEX
Explanations
references to repetition or similarity
doing the same
New Auto-Interp
Negative Logits
urlpatterns
-0.52
RTEX
-0.51
avoient
-0.47
dermatologist
-0.42
Obrador
-0.40
vertrou
-0.40
anoma
-0.39
étoient
-0.38
DllImport
-0.38
råd
-0.38
POSITIVE LOGITS
<>",
0.56
同様に
0.55
Likewise
0.55
similarly
0.53
retudo
0.52
同样
0.52
same
0.52
likewise
0.52
ditto
0.51
Likewise
0.51
Activations Density 0.010%