INDEX
Explanations
numerical data or references
New Auto-Interp
Negative Logits
198
-0.25
Reagan
-0.22
aldi
-0.16
ardo
-0.16
Peek
-0.15
Û±Û¹Û¸
-0.15
ouse
-0.15
Jennifer
-0.14
Jennifer
-0.14
zi
-0.14
POSITIVE LOGITS
69
0.37
68
0.34
71
0.30
66
0.30
70
0.29
67
0.28
069
0.24
72
0.24
169
0.23
068
0.23
Activations Density 0.098%