INDEX
Explanations
occurrences of high frequency, attention-seeking words or phrases
New Auto-Interp
Negative Logits
stit
-0.15
jerne
-0.14
NEG
-0.14
Jeh
-0.14
earch
-0.14
ì¶Ķ
-0.13
estone
-0.13
åŃĿ
-0.13
oes
-0.13
experimentation
-0.13
POSITIVE LOGITS
SharedPtr
0.15
iov
0.15
çĶ
0.15
lag
0.15
IDES
0.15
__,__
0.15
eya
0.15
unda
0.15
oute
0.14
eko
0.14
Activations Density 0.009%