INDEX
Explanations
relative pronouns after punctuation
New Auto-Interp
Negative Logits
,
0.57
)
0.48
.
0.43
"
0.43
."
0.39
]
0.38
}
0.37
<.
0.37
。
0.37
).
0.36
POSITIVE LOGITS
который
0.75
которая
0.75
जिसमें
0.71
jossa
0.71
which
0.67
which
0.66
которое
0.66
která
0.65
która
0.63
ktorá
0.63
Activations Density 0.016%