INDEX
Explanations
references to life events or conditions
New Auto-Interp
Negative Logits
rove
-0.17
vil
-0.17
IGO
-0.15
دÙĩ
-0.15
Ku
-0.15
и
-0.15
ève
-0.15
rib
-0.14
warm
-0.14
otte
-0.14
POSITIVE LOGITS
inha
0.17
utto
0.15
hangi
0.15
itzer
0.14
iri
0.14
Casual
0.14
ibold
0.14
adoo
0.14
imers
0.14
antan
0.14
Activations Density 0.000%