INDEX
Explanations
phrases related to sizes, comparison, and groups of people or entities
connections between large and small entities or groups
New Auto-Interp
Negative Logits
Rabbit
-0.70
Reilly
-0.69
McAuliffe
-0.68
Hack
-0.67
Nieto
-0.67
Wilde
-0.66
Grail
-0.66
Lite
-0.65
Valkyrie
-0.65
Chau
-0.63
POSITIVE LOGITS
arsen
0.94
anguage
0.81
ategories
0.79
Islands
0.76
hran
0.75
yz
0.75
cules
0.74
ritical
0.72
cious
0.71
alle
0.71
Activations Density 0.258%