INDEX
Explanations
elements related to celebrity relationships and personal stories
New Auto-Interp
Negative Logits
TOTYPE
-0.18
ukkit
-0.17
ục
-0.17
inely
-0.16
olars
-0.16
inas
-0.16
лиÑĨ
-0.16
azzi
-0.15
ÃŃg
-0.15
rtle
-0.15
POSITIVE LOGITS
former
0.27
0.26
veteran
0.25
pair
0.23
star
0.23
native
0.22
dimin
0.22
entert
0.21
aff
0.21
bes
0.20
Activations Density 0.200%