INDEX
Explanations
references to specific individuals, especially politicians and cultural figures, along with associated locations and events
New Auto-Interp
Negative Logits
æ¥
-0.18
rand
-0.16
aise
-0.15
usan
-0.15
atform
-0.14
Seymour
-0.14
Merit
-0.14
appear
-0.14
udic
-0.14
/stdc
-0.14
POSITIVE LOGITS
-h
0.20
-H
0.16
ilda
0.15
mons
0.15
iles
0.15
Ùĩد
0.15
chan
0.15
imbus
0.15
_h
0.15
HV
0.15
Activations Density 0.055%