INDEX
Explanations
the concept of 'change'
references to the concept of change
New Auto-Interp
Negative Logits
amina
-0.79
LIMITED
-0.77
ç«
-0.72
AFB
-0.72
vern
-0.69
mination
-0.68
¯¯¯¯
-0.68
-+-+
-0.67
APH
-0.66
Bei
-0.66
POSITIVE LOGITS
overs
0.88
over
0.85
able
0.75
xual
0.74
wrought
0.74
agents
0.72
blindness
0.71
making
0.69
drastically
0.67
radically
0.67
Activations Density 0.038%