INDEX
Explanations
countries, cities, and organizations
the end-of-text token
New Auto-Interp
Negative Logits
ilar
-0.61
ãĤ¦ãĤ¹
-0.61
atever
-0.60
bler
-0.59
ueless
-0.58
akespe
-0.56
herer
-0.55
farious
-0.55
ilaterally
-0.55
ttes
-0.54
POSITIVE LOGITS
analyst
0.61
officials
0.61
reacts
0.58
archaeologists
0.56
awoke
0.55
mourn
0.55
spokesman
0.55
intends
0.54
zbollah
0.53
Oversight
0.53
Activations Density 0.372%