INDEX
Explanations
similes comparing actions or events to the way something else behaves
New Auto-Interp
Negative Logits
ulic
-0.70
hirt
-0.69
qua
-0.68
inas
-0.68
SEE
-0.67
VICE
-0.66
ilic
-0.65
inge
-0.65
icipated
-0.65
hetics
-0.64
POSITIVE LOGITS
wildfire
1.20
crazy
1.05
clock
0.97
bandits
0.86
mad
0.86
liest
0.84
weeds
0.84
flies
0.84
glue
0.82
rabbits
0.82
Activations Density 0.053%