INDEX
Explanations
comparisons and contrasts between different regions, historical events, or societal issues
New Auto-Interp
Negative Logits
Bylo
-0.17
urum
-0.16
Spec
-0.15
bert
-0.14
alon
-0.14
apesh
-0.14
ogl
-0.14
zs
-0.14
ogue
-0.14
ÙĤب
-0.14
POSITIVE LOGITS
similar
0.21
experience
0.19
comparable
0.18
similar
0.17
imilar
0.16
experi
0.16
experience
0.16
past
0.16
rollo
0.16
experiencia
0.16
Activations Density 0.305%