INDEX
Explanations
phrases related to societal or political power dynamics
references to entities or groups identified by the suffix "s"
New Auto-Interp
Negative Logits
bear
-0.72
esp
-0.70
Tap
-0.69
tags
-0.68
CLAIM
-0.67
Å¡
-0.66
fish
-0.66
tk
-0.65
Rothschild
-0.64
horse
-0.63
POSITIVE LOGITS
own
0.96
selves
0.95
plight
0.79
terday
0.78
pecially
0.78
etheless
0.76
senal
0.76
successor
0.75
whereabouts
0.73
olution
0.71
Activations Density 0.171%