INDEX
Explanations
references to individuals and their roles, particularly in political or influential contexts
New Auto-Interp
Negative Logits
es
-0.28
er
-0.21
eid
-0.19
eyn
-0.18
ea
-0.18
ease
-0.17
en
-0.16
oir
-0.16
erus
-0.16
erli
-0.16
POSITIVE LOGITS
ging
0.28
ÌĨ
0.24
ged
0.23
gers
0.23
gregator
0.22
gregation
0.21
gle
0.21
lio
0.20
ued
0.19
gy
0.19
Activations Density 0.043%