INDEX
Explanations
references to power dynamics and authority figures
New Auto-Interp
Negative Logits
NavController
-0.46
universality
-0.45
UNIVERSAL
-0.42
CION
-0.40
endTime
-0.40
rimid
-0.40
PWD
-0.40
setTime
-0.40
Universal
-0.39
nakalista
-0.39
POSITIVE LOGITS
цездатний
0.51
antMatchers
0.48
Tembelea
0.46
temper
0.44
cultivation
0.43
tagext
0.42
cultivated
0.41
jores
0.40
mergeFrom
0.40
étudi
0.39
Activations Density 0.084%