INDEX
Explanations
specific instances or mentions of the word "distraction"
references to the term "distribution" in various contexts
New Auto-Interp
Negative Logits
glers
-1.12
swick
-1.00
tes
-0.98
ppo
-0.82
terday
-0.74
ENA
-0.74
gery
-0.72
cair
-0.71
uberty
-0.70
Slater
-0.69
POSITIVE LOGITS
ribut
1.29
ributed
1.22
inguished
1.09
ribute
1.07
ribution
1.06
illery
1.02
ortion
1.00
rict
0.99
Dist
0.97
enfranch
0.96
Activations Density 0.003%