INDEX
Explanations
references to particular groups within specific historical contexts
New Auto-Interp
Negative Logits
procs
-0.16
hood
-0.15
oles
-0.14
outstanding
-0.14
isle
-0.14
hora
-0.14
filer
-0.14
lua
-0.14
ãĥªãĤ«
-0.14
unning
-0.13
POSITIVE LOGITS
ze
0.55
z
0.54
zer
0.51
zen
0.50
zes
0.47
zt
0.46
zy
0.44
zs
0.40
zb
0.40
zk
0.39
Activations Density 0.056%