INDEX
Explanations
phrases emphasizing specific instances or notable moments within a context
New Auto-Interp
Negative Logits
amac
-0.16
Hughes
-0.15
_PCM
-0.15
zan
-0.15
omm
-0.15
ikan
-0.15
essaging
-0.14
athom
-0.14
BCM
-0.14
ifest
-0.14
POSITIVE LOGITS
bulk
0.16
ucci
0.15
że
0.15
axon
0.14
oka
0.14
uate
0.14
ulis
0.14
ichni
0.14
_gb
0.14
ÙĨدر
0.14
Activations Density 0.086%