INDEX
Explanations
phrases related to dramatic events or impactful moments
New Auto-Interp
Negative Logits
beck
-0.17
ãĥ¬ãĥĥãĥĪ
-0.15
аÑĤков
-0.15
průbÄĽhu
-0.14
بÙĪØ§Ø³Ø·Ø©
-0.14
.persistent
-0.14
irse
-0.14
cheid
-0.14
.sa
-0.14
ÂłÙħ
-0.14
POSITIVE LOGITS
bang
0.57
Bang
0.55
boom
0.54
Boom
0.50
Bang
0.49
Bam
0.48
bang
0.45
BAM
0.43
bam
0.42
boom
0.41
Activations Density 0.107%