INDEX
Explanations
references to changes or modifications
references to the concept of change
New Auto-Interp
Negative Logits
amina
-0.76
Bei
-0.69
NK
-0.61
LIMITED
-0.60
ducks
-0.59
vern
-0.59
Saud
-0.59
ographies
-0.58
Peninsula
-0.57
DRAGON
-0.57
POSITIVE LOGITS
over
1.08
overs
0.96
effected
0.92
wrought
0.89
making
0.87
able
0.83
agents
0.80
itri
0.77
maker
0.77
xual
0.77
Activations Density 0.060%