INDEX
Explanations
names related to various locations and individuals
names and shout-outs related to people or entities
New Auto-Interp
Negative Logits
cig
-0.71
«ĺ
-0.69
¬¼
-0.68
Labrador
-0.68
dummy
-0.67
occas
-0.64
Viking
-0.64
Fenrir
-0.63
ablishment
-0.62
restraints
-0.61
POSITIVE LOGITS
Shak
1.04
lar
0.93
arak
0.90
aji
0.90
ti
0.88
ilo
0.86
spe
0.86
uras
0.85
htar
0.85
tered
0.85
Activations Density 0.011%