INDEX
Explanations
US state names
references to geographical entities and their political contexts
New Auto-Interp
Negative Logits
akes
-0.64
BF
-0.61
ifiers
-0.60
ickers
-0.58
CODE
-0.58
inject
-0.58
adder
-0.57
advers
-0.57
cakes
-0.56
augment
-0.56
POSITIVE LOGITS
zbek
0.72
Whitman
0.71
Truck
0.70
Tuc
0.69
Thousand
0.69
Vermont
0.69
Vulcan
0.69
UNIVERS
0.69
Vatican
0.68
sov
0.67
Activations Density 0.124%