INDEX
Explanations
the presence of the word "here" in various contexts
New Auto-Interp
Negative Logits
Ethiopian
-0.52
Apoll
-0.50
Tibetan
-0.50
опро
-0.48
Gogh
-0.48
Fanning
-0.47
pAd
-0.47
Polonia
-0.47
PD
-0.47
Draft
-0.46
POSITIVE LOGITS
here
1.16
Here
1.10
here
1.09
Here
1.08
HERE
1.06
HERE
1.00
aquí
0.93
Aquí
0.91
tää
0.90
здесь
0.84
Activations Density 0.088%