INDEX
Explanations
words related to explosions or explosive events
New Auto-Interp
Negative Logits
ein
-0.18
684
-0.17
635
-0.16
oucher
-0.16
ulen
-0.15
ebek
-0.15
Ïħνα
-0.15
588
-0.15
asley
-0.15
Hoch
-0.15
POSITIVE LOGITS
lass
0.18
stag
0.16
lant
0.16
.argument
0.16
utsch
0.15
antine
0.14
ãĥĥãĤ°
0.14
adden
0.13
AML
0.13
recht
0.13
Activations Density 0.002%