INDEX
Explanations
phrases that reference beginnings or starts of events
New Auto-Interp
Negative Logits
ensch
-0.17
HEMA
-0.15
bane
-0.15
obel
-0.15
[:]
-0.14
.copyWith
-0.14
pillar
-0.14
opo
-0.14
#
-0.13
apo
-0.13
POSITIVE LOGITS
ÙĦس
0.17
ëŀ
0.16
TM
0.15
498
0.14
orf
0.14
iciel
0.14
261
0.14
869
0.14
903
0.14
661
0.14
Activations Density 0.022%