INDEX
Explanations
phrases related to social structures and their implications
New Auto-Interp
Negative Logits
obook
-0.75
unfocusedRange
-0.70
ocracy
-0.64
chair
-0.63
ropolis
-0.63
pod
-0.63
aroo
-0.62
phas
-0.61
bus
-0.61
Tuc
-0.59
POSITIVE LOGITS
respectively
1.48
alike
1.27
mutually
1.03
intertwined
1.03
jointly
0.99
trademarks
0.95
insepar
0.92
examples
0.90
fronts
0.88
both
0.86
Activations Density 0.214%