INDEX
Explanations
pphrases describing forceful physical actions
comparative phrases using the word "like."
New Auto-Interp
Negative Logits
hiba
-0.85
ulty
-0.82
inion
-0.79
iets
-0.79
alf
-0.74
ilic
-0.72
oard
-0.70
endiary
-0.69
omsky
-0.68
ourse
-0.68
POSITIVE LOGITS
lihood
1.56
lier
1.09
liest
1.08
wildfire
0.87
liness
0.85
ours
0.80
clock
0.79
minded
0.77
minded
0.77
crazy
0.69
Activations Density 0.082%