INDEX
Explanations
mentions of the term "Pen" at a high level of activation
New Auto-Interp
Negative Logits
mono
-0.65
orically
-0.56
ãĤ·
-0.56
åį
-0.55
à¦
-0.55
ãĥŁ
-0.54
vag
-0.54
è¦ļéĨĴ
-0.54
使
-0.54
ÑĢ
-0.53
POSITIVE LOGITS
cil
1.32
alties
1.12
elope
1.09
itent
1.02
insula
0.98
nington
0.91
esville
0.86
ning
0.85
etr
0.82
issance
0.82
Activations Density 7.810%