INDEX
Explanations
instances of the word "show" and its variations, indicating a focus on demonstration or presentation
New Auto-Interp
Negative Logits
İ
-0.15
pch
-0.14
ucken
-0.14
landır
-0.14
ilogy
-0.13
essel
-0.13
leared
-0.13
nish
-0.13
arih
-0.13
ará
-0.13
POSITIVE LOGITS
signs
0.35
how
0.34
off
0.32
-off
0.29
boat
0.28
up
0.28
why
0.28
Signs
0.27
off
0.26
-case
0.26
Activations Density 0.095%