INDEX
Explanations
information about individuals and their personal backgrounds or careers
New Auto-Interp
Negative Logits
1
-0.17
ker
-0.15
aira
-0.15
2
-0.15
Greg
-0.14
Dear
-0.14
cla
-0.13
оÑĢа
-0.13
952
-0.13
12
-0.13
POSITIVE LOGITS
åĨĻ
0.19
Writing
0.18
writing
0.18
寫
0.18
Writing
0.17
writing
0.17
æĴ°
0.17
_Write
0.17
writ
0.16
ertiary
0.16
Activations Density 0.106%