INDEX
Explanations
phrases and expressions indicating similarity or preference
New Auto-Interp
Negative Logits
acco
-0.16
asca
-0.16
ses
-0.16
rou
-0.16
asco
-0.15
roe
-0.15
lio
-0.14
ãģ¹ãģį
-0.14
,LOCATION
-0.14
ãģ¾ãģŁ
-0.14
POSITIVE LOGITS
-minded
0.37
minded
0.29
unto
0.29
váºŃy
0.25
WISE
0.24
able
0.24
clock
0.23
nhau
0.22
wildfire
0.20
-wise
0.20
Activations Density 0.095%