INDEX
Explanations
references to changes and transformations in societal norms or practices
New Auto-Interp
Negative Logits
oleon
-0.16
actable
-0.15
erb
-0.15
rát
-0.14
Redistributions
-0.14
oog
-0.14
loit
-0.14
opak
-0.14
rnek
-0.14
iasm
-0.14
POSITIVE LOGITS
abandon
0.52
ditch
0.49
abandoned
0.46
abandonment
0.43
abandoning
0.43
fors
0.37
discard
0.35
discarded
0.34
dropped
0.34
Dump
0.33
Activations Density 0.677%