INDEX
Explanations
words related to negative experiences or actions
New Auto-Interp
Negative Logits
DonaldTrump
-0.65
Paddock
-0.59
ibrary
-0.56
Eden
-0.55
Kaiser
-0.55
House
-0.54
Collider
-0.53
Everest
-0.53
Wildcats
-0.52
Pillar
-0.52
POSITIVE LOGITS
alike
0.73
eworthy
0.71
ifies
0.71
ilit
0.71
versa
0.69
ifying
0.66
ining
0.66
ify
0.66
igmat
0.66
aund
0.65
Activations Density 0.253%