INDEX
Explanations
references to structural failures or deficiencies in systems
New Auto-Interp
Negative Logits
sz
-0.17
ollider
-0.15
usra
-0.15
piler
-0.15
ecast
-0.15
upakan
-0.14
spar
-0.14
azers
-0.14
Æ
-0.14
zure
-0.14
POSITIVE LOGITS
ÎĦ
0.26
.gr
0.24
Greek
0.21
Athens
0.21
ο
0.19
Greece
0.19
Dimit
0.19
akis
0.19
oulos
0.19
Kou
0.18
Activations Density 0.117%