INDEX
Explanations
references to changes or transformations
mentions of the word "change."
New Auto-Interp
Negative Logits
amina
-0.82
Bei
-0.74
ducks
-0.67
ross
-0.66
Purg
-0.65
vern
-0.64
LIMITED
-0.63
DRAGON
-0.63
Gerr
-0.63
Whale
-0.63
POSITIVE LOGITS
over
0.98
overs
0.92
wrought
0.85
effected
0.83
able
0.82
making
0.81
agents
0.79
ials
0.78
xual
0.78
iations
0.77
Activations Density 0.058%