INDEX
Explanations
references to body parts
New Auto-Interp
Negative Logits
heny
-0.62
arenthood
-0.62
urdue
-0.58
versely
-0.57
Recomm
-0.56
qua
-0.56
theless
-0.54
interaction
-0.54
rarity
-0.53
hallucinations
-0.53
POSITIVE LOGITS
plug
0.70
Plug
0.70
asses
0.69
Ĥ¬
0.66
ashington
0.65
ħĭ
0.64
horn
0.63
ache
0.63
burned
0.62
blown
0.62
Activations Density 0.057%