INDEX
Explanations
the presence of the substring "Tr" within words
New Auto-Interp
Negative Logits
ozÃŃ
-0.20
iap
-0.18
yh
-0.18
yd
-0.17
iem
-0.17
enko
-0.16
eners
-0.16
aimassage
-0.16
oit
-0.15
ertas
-0.15
POSITIVE LOGITS
acy
0.30
inity
0.29
avis
0.29
inidad
0.29
ailer
0.28
usted
0.27
istan
0.27
actor
0.27
udeau
0.26
ained
0.25
Activations Density 0.011%