INDEX
Explanations
places and geographic locations
geographic names and locations
New Auto-Interp
Negative Logits
censored
-0.72
superheroes
-0.69
Australians
-0.68
inexperienced
-0.68
governments
-0.67
feds
-0.66
CEOs
-0.66
reviewer
-0.65
editors
-0.64
Bezos
-0.63
POSITIVE LOGITS
bah
1.06
qqa
1.05
unda
1.01
abad
1.00
Cemetery
0.97
ulla
0.97
raq
0.96
pora
0.96
ridor
0.95
ovo
0.95
Activations Density 0.262%