INDEX
Explanations
references to the word "Burn" and its variations
New Auto-Interp
Negative Logits
licit
-0.08
ditor
-0.07
θε
-0.07
åİħ
-0.07
anford
-0.07
gger
-0.07
ually
-0.07
uale
-0.07
ulin
-0.06
quare
-0.06
POSITIVE LOGITS
outs
0.08
ishing
0.08
ished
0.08
ðŁĶ
0.07
out
0.07
side
0.07
away
0.07
Lazar
0.07
çĩ
0.07
iece
0.07
Activations Density 0.013%