INDEX
Explanations
the word "linda" with varying activations
occurrences of the name "Linda."
New Auto-Interp
Negative Logits
frost
-0.71
platforms
-0.64
draft
-0.60
safely
-0.60
vault
-0.59
rated
-0.57
Snap
-0.57
post
-0.57
Champions
-0.57
/+
-0.57
POSITIVE LOGITS
inda
4.35
issa
1.22
antha
1.17
indu
1.15
inia
1.15
ind
1.13
uti
1.08
ydia
1.06
INTON
1.06
Linda
1.05
Activations Density 0.014%