INDEX
Explanations
geographical names and references to locations
New Auto-Interp
Negative Logits
hind
-0.15
zier
-0.14
Dong
-0.14
agini
-0.13
æ²
-0.13
пад
-0.13
elper
-0.13
Demir
-0.13
uela
-0.13
iets
-0.13
POSITIVE LOGITS
Ann
0.21
ANN
0.20
ANN
0.18
Rack
0.17
ann
0.16
Wolver
0.16
Ann
0.16
ults
0.16
apiro
0.16
Wolverine
0.15
Activations Density 0.013%