INDEX
Explanations
email addresses and sender names
New Auto-Interp
Negative Logits
pleaſure
-0.94
Monfieur
-0.92
houſe
-0.92
purpoſe
-0.89
Theſe
-0.88
متعلقه
-0.88
+#+#
-0.87
Efq
-0.87
ſmall
-0.86
myſelf
-0.83
POSITIVE LOGITS
so
0.41
b
0.40
entfer
0.40
z
0.39
Modific
0.37
irradiated
0.37
diputado
0.37
arn
0.36
шный
0.35
giả
0.35
Activations Density 0.262%