INDEX
Explanations
the word "here"
New Auto-Interp
Negative Logits
here
-3.20
aquí
-2.31
here
-2.27
disini
-2.08
ici
-2.03
Here
-1.95
هنا
-1.95
Here
-1.93
aici
-1.91
aqui
-1.88
POSITIVE LOGITS
(
0.65
↵
0.62
<eos>
0.62
0.58
::
0.56
The
0.56
↵↵
0.55
This
0.54
(
0.53
"
0.52
Activations Density 0.968%