INDEX
Explanations
references to significant actions or events within a narrative context
New Auto-Interp
Negative Logits
θÎŃ
-0.15
ÙħØŃ
-0.14
ldr
-0.14
çĸĨ
-0.14
werk
-0.14
lict
-0.13
lett
-0.13
linger
-0.13
èĮĥ
-0.13
bins
-0.13
POSITIVE LOGITS
ekler
0.16
ACE
0.16
details
0.15
CD
0.15
otec
0.15
Bilim
0.14
detail
0.14
Ace
0.14
adaki
0.14
branches
0.14
Activations Density 0.001%