INDEX
Explanations
references to personal experiences and relationships
New Auto-Interp
Negative Logits
fallen
-0.16
دÙĬ
-0.16
leck
-0.15
uzzi
-0.15
ridden
-0.14
Lage
-0.14
ewire
-0.14
sooner
-0.14
ovation
-0.14
æŀ
-0.14
POSITIVE LOGITS
Pul
0.28
Ke
0.26
Ta
0.25
Bro
0.22
Sp
0.22
Shared
0.21
Open
0.21
Ran
0.20
Drew
0.20
Met
0.20
Activations Density 0.211%