INDEX
Explanations
phrases that describe comparison and contrasting situations
New Auto-Interp
Negative Logits
essen
-0.19
uj
-0.16
835
-0.15
ehr
-0.15
rike
-0.15
ochond
-0.15
oleon
-0.15
emmel
-0.15
nici
-0.14
akit
-0.14
POSITIVE LOGITS
ones
0.30
ones
0.16
Ones
0.16
ãģIJ
0.14
lik
0.14
Dit
0.14
dit
0.14
lico
0.14
ãģĿãĤĮãģ¯
0.14
Fav
0.13
Activations Density 0.149%