INDEX
Explanations
mentions of public figures and political events
New Auto-Interp
Negative Logits
Cause
-0.71
ords
-0.66
direction
-0.66
icum
-0.65
nect
-0.64
ãĤ¼ãĤ¦ãĤ¹
-0.63
"]=>
-0.62
beit
-0.62
uld
-0.62
lled
-0.62
POSITIVE LOGITS
alike
1.43
extraord
1.23
respectively
1.17
advocate
0.85
educator
0.73
Richard
0.70
enthusiast
0.67
Ann
0.65
activist
0.64
specializing
0.64
Activations Density 0.110%