INDEX
Explanations
mentions of the name "Zuma" at varying activation levels
repeated mentions of the name "Zuma."
New Auto-Interp
Negative Logits
neath
-0.71
tons
-0.69
icles
-0.69
ician
-0.68
Canadians
-0.68
Ö¼
-0.66
Kear
-0.65
igree
-0.63
working
-0.62
sheet
-0.62
POSITIVE LOGITS
ppa
0.99
uma
0.97
BLE
0.88
ascus
0.84
ULT
0.81
UGH
0.80
qua
0.79
isoft
0.76
urs
0.76
ffe
0.73
Activations Density 0.013%