INDEX
Explanations
references to "other" entities or categories in various contexts
New Auto-Interp
Negative Logits
itself
-0.47
itself
-0.46
︎
-0.41
itſelf
-0.39
┛
-0.39
itulah
-0.39
both
-0.38
之旅
-0.35
something
-0.35
PhysRevD
-0.34
POSITIVE LOGITS
worldly
1.34
than
0.98
niż
0.88
similarly
0.77
equally
0.77
decât
0.77
THAN
0.74
similar
0.73
similar
0.69
liknande
0.68
Activations Density 0.303%