INDEX
Explanations
names of individuals, particularly "Nate."
mentions of specific individuals, particularly focusing on the name "Nate."
New Auto-Interp
Negative Logits
iance
-1.07
loo
-0.97
iances
-0.91
ijk
-0.86
ily
-0.82
osite
-0.79
usc
-0.77
stadt
-0.74
ies
-0.74
aunder
-0.74
POSITIVE LOGITS
IELD
0.77
heastern
0.75
hower
0.71
ãĤ¡
0.69
BLIC
0.65
Manson
0.65
eor
0.64
ÃįÃį
0.64
ppo
0.63
Niet
0.62
Activations Density 0.078%