INDEX
Explanations
names of people or places
mentions of individuals or their names
New Auto-Interp
Negative Logits
replay
-0.68
AFTA
-0.64
cffffcc
-0.63
Proced
-0.62
milo
-0.62
coma
-0.59
Rats
-0.58
blot
-0.58
Solitaire
-0.58
psychiat
-0.58
POSITIVE LOGITS
enthal
0.98
igans
0.93
enger
0.91
know
0.80
qt
0.80
hed
0.76
é¾įåĸļ士
0.75
erd
0.75
ington
0.75
eller
0.74
Activations Density 0.067%