INDEX
Explanations
verbs related to circumstances or consequences
New Auto-Interp
Negative Logits
anon
-0.72
atari
-0.69
=-=-=-=-=-=-=-=-
-0.68
Sheep
-0.68
ille
-0.66
asp
-0.65
arro
-0.65
arest
-0.62
aah
-0.62
Turtles
-0.62
POSITIVE LOGITS
diligence
1.16
giving
0.88
lling
0.80
dilig
0.73
itiz
0.71
itations
0.71
cancell
0.70
)=(
0.69
solely
0.66
irect
0.65
Activations Density 0.305%