INDEX
Explanations
references to personal experiences and demographics related to living in the US
New Auto-Interp
Negative Logits
themſelves
-0.87
GOTREF
-0.86
ویکیپدیای
-0.83
itſelf
-0.78
campista
-0.78
pleaſure
-0.76
ſtate
-0.76
ſelf
-0.76
myſelf
-0.75
Efq
-0.74
POSITIVE LOGITS
,
0.55
0.47
Be
0.43
kind
0.41
generally
0.40
I
0.40
makes
0.40
living
0.39
…
0.39
consultato
0.39
Activations Density 0.288%