INDEX
Explanations
references to specific roles and characters in film or theater
New Auto-Interp
Negative Logits
es
-0.17
ota
-0.16
uj
-0.15
eda
-0.15
och
-0.15
ij
-0.15
.gb
-0.15
uji
-0.15
riba
-0.15
кÑĸн
-0.15
POSITIVE LOGITS
Äį
0.19
vn
0.18
Å¡
0.18
ÅĻÃŃ
0.17
Äı
0.17
bn
0.17
legg
0.16
zÃŃ
0.16
zn
0.16
jc
0.16
Activations Density 0.012%