INDEX
Explanations
mentions of the word "wer" with varying levels of activation
the word "wer" in various forms, signaling a focus on variations of that term
New Auto-Interp
Negative Logits
dogs
-0.64
ghosts
-0.64
PTSD
-0.64
accompanying
-0.64
Jinping
-0.62
makeup
-0.62
parenting
-0.61
esp
-0.60
âĹı
-0.60
paramedics
-0.60
POSITIVE LOGITS
wer
4.76
WER
1.65
wered
1.31
wark
1.18
Wer
1.14
ws
1.12
swer
1.11
wen
1.10
wed
1.04
w
1.04
Activations Density 0.011%