INDEX
Explanations
references to personal experiences or attributes
New Auto-Interp
Negative Logits
mijne
-0.38
ggior
-0.37
itieren
-0.34
маса
-0.34
يتيمه
-0.34
millón
-0.33
nakalista
-0.32
défa
-0.32
razio
-0.32
갓
-0.31
POSITIVE LOGITS
Personal
0.86
personal
0.80
personal
0.78
Personal
0.77
個人
0.72
个人
0.71
PERSONAL
0.70
personale
0.69
PERSONAL
0.69
Individual
0.66
Activations Density 1.141%