INDEX
Explanations
phrases related to descriptions and historical events, particularly regarding significant figures and locations
New Auto-Interp
Negative Logits
okie
-0.16
avl
-0.16
Kiev
-0.16
Laud
-0.15
aval
-0.15
uv
-0.15
gy
-0.15
aurant
-0.14
enant
-0.14
alte
-0.14
POSITIVE LOGITS
Ali
1.13
Ali
1.00
ali
0.86
ali
0.76
Alien
0.68
ALI
0.66
aliens
0.65
.ali
0.64
alien
0.62
عÙĦÙĬ
0.54
Activations Density 0.135%