INDEX
Explanations
similes comparing different situations
comparisons or similes
New Auto-Interp
Negative Logits
erity
-0.92
iencies
-0.89
idates
-0.82
iets
-0.77
icators
-0.77
icator
-0.73
icity
-0.72
icals
-0.69
ixel
-0.69
odes
-0.68
POSITIVE LOGITS
lier
1.34
liest
1.25
lihood
1.01
comparing
0.81
wildfire
0.79
waking
0.78
watching
0.75
liness
0.75
spitting
0.74
unto
0.72
Activations Density 0.070%