INDEX
Explanations
flawed, biased, corrupted, fake
New Auto-Interp
Negative Logits
l
0.55
langan
0.53
en
0.49
o
0.47
Clouds
0.46
Computed
0.46
el
0.45
ermöglichen
0.45
lan
0.44
et
0.44
POSITIVE LOGITS
inanimate
0.45
tiver
0.45
inizin
0.44
healed
0.44
undead
0.44
widowed
0.43
receptive
0.43
poking
0.42
'-
0.42
layak
0.42
Activations Density 0.029%