INDEX
Explanations
the word "Derby" at different activations
mentions of Derby and individuals related to it
New Auto-Interp
Negative Logits
itar
-0.95
ities
-0.93
ITIES
-0.90
ité
-0.87
itary
-0.86
ropri
-0.86
ified
-0.84
uador
-0.84
itol
-0.81
itatively
-0.80
POSITIVE LOGITS
lli
0.86
Devils
0.78
detector
0.74
Fax
0.73
DoS
0.73
lla
0.71
Derby
0.69
sting
0.68
Gibbs
0.67
detectors
0.67
Activations Density 0.042%