INDEX
Explanations
item type or attribute identification
New Auto-Interp
Negative Logits
Елена
0.48
ataque
0.46
кризи
0.46
지와
0.44
ambivalent
0.44
要注意
0.43
aparikkh
0.43
divergents
0.43
swearing
0.43
susceptibles
0.43
POSITIVE LOGITS
type
0.56
类型
0.54
type
0.54
number
0.52
Yes
0.50
Technology
0.49
Type
0.49
types
0.49
size
0.49
類型
0.49
Activations Density 0.034%