INDEX
Explanations
changing paradigms and perceptions
New Auto-Interp
Negative Logits
correctamente
0.46
correctly
0.41
优点
0.40
properly
0.40
правильно
0.39
correctement
0.38
высокой
0.37
poseb
0.36
buenas
0.36
improvements
0.36
POSITIVE LOGITS
perceptions
0.86
priorities
0.84
paradigms
0.79
perception
0.79
attitudes
0.77
behavior
0.73
paradigm
0.71
paradigma
0.71
approaches
0.70
восприя
0.70
Activations Density 0.065%