INDEX
Explanations
people or entities mentioned in news articles, especially politicians or public figures
New Auto-Interp
Negative Logits
taboola
-0.71
wagen
-0.65
juggling
-0.61
incremental
-0.61
muc
-0.61
inconven
-0.60
sshd
-0.60
metaphor
-0.60
contraction
-0.59
levers
-0.58
POSITIVE LOGITS
aline
0.99
andise
0.80
itness
0.78
onso
0.77
thood
0.76
aida
0.75
bia
0.75
anto
0.73
asio
0.73
minus
0.73
Activations Density 0.045%