INDEX
Explanations
phrases related to actions causing change or consequences
actions related to transformation or change
New Auto-Interp
Negative Logits
ancies
-0.62
idence
-0.59
inia
-0.56
occupancy
-0.56
redo
-0.54
rise
-0.54
dom
-0.54
awks
-0.53
ajo
-0.53
brate
-0.53
POSITIVE LOGITS
hostage
0.74
aundering
0.73
Ń·
0.67
ety
0.67
../
0.66
arbitrarily
0.66
UC
0.64
ĸļ
0.64
YING
0.64
by
0.64
Activations Density 0.226%