INDEX
Explanations
phrases that indicate relationships or conditions between entities, often expressed through relative clauses
New Auto-Interp
Negative Logits
dė
-0.59
らう
-0.57
醐
-0.53
ferie
-0.53
vieja
-0.53
fallen
-0.52
ratify
-0.52
waking
-0.52
Team
-0.52
drücken
-0.52
POSITIVE LOGITS
who
1.08
ambao
0.94
which
0.93
[]:
0.92
οποίο
0.91
which
0.86
które
0.85
who
0.84
quien
0.83
والذي
0.82
Activations Density 0.373%