INDEX
Explanations
references to a specific name "Dorner"
mentions of a specific individual named Dor
New Auto-Interp
Negative Logits
anwhile
-0.80
unct
-0.69
episode
-0.68
Terrorism
-0.67
unction
-0.67
testing
-0.66
heet
-0.65
ulative
-0.65
WAYS
-0.64
uncture
-0.63
POSITIVE LOGITS
Dor
1.06
cas
0.95
ÃŃa
0.93
iane
0.91
oshenko
0.85
je
0.85
ado
0.82
chester
0.81
idad
0.79
adan
0.79
Activations Density 0.004%