INDEX
Explanations
similes comparing actions to physical force or movement
comparisons or similes
New Auto-Interp
Negative Logits
ilic
-0.79
inion
-0.76
byn
-0.76
ysical
-0.75
elin
-0.74
inas
-0.74
hiba
-0.73
hani
-0.73
iets
-0.72
Published
-0.72
POSITIVE LOGITS
lihood
1.17
lier
1.02
liest
0.97
wildfire
0.89
clock
0.84
crazy
0.84
liness
0.80
minded
0.73
mad
0.72
idiots
0.72
Activations Density 0.063%