INDEX
Explanations
proper nouns or names
references to organizational structures or identities
New Auto-Interp
Negative Logits
Cornwall
-0.75
Statistics
-0.71
agne
-0.70
Hat
-0.69
burg
-0.65
enery
-0.65
Alps
-0.64
ãĥ´ãĤ¡
-0.64
Bundes
-0.63
Channel
-0.63
POSITIVE LOGITS
JECT
1.22
OB
1.21
ilib
0.99
rien
0.95
ooth
0.91
tained
0.91
acter
0.89
BY
0.89
bably
0.88
oby
0.87
Activations Density 0.004%