INDEX
Explanations
geographical names and locations
New Auto-Interp
Negative Logits
Redditor
-0.78
Britann
-0.71
ks
-0.68
cession
-0.67
fusc
-0.65
artz
-0.64
dule
-0.63
fing
-0.62
Sparrow
-0.62
chair
-0.61
POSITIVE LOGITS
VILLE
1.07
ENN
1.04
OUN
1.01
GOODMAN
0.99
ING
0.98
OU
0.98
EDITION
0.98
ISH
0.97
ALL
0.95
INC
0.95
Activations Density 0.082%