INDEX
Explanations
phrases indicating involvement or participation in actions or events
New Auto-Interp
Negative Logits
inkel
-0.15
akra
-0.15
elin
-0.15
Abb
-0.15
YPE
-0.15
cape
-0.15
ouz
-0.14
irk
-0.14
anch
-0.14
cad
-0.14
POSITIVE LOGITS
apiro
0.14
Lucas
0.14
aina
0.14
paraph
0.14
ạt
0.13
Paula
0.13
dbl
0.13
Jak
0.13
col
0.13
doubling
0.13
Activations Density 0.211%