INDEX
Explanations
phrases related to disagreement and political ideologies
New Auto-Interp
Negative Logits
amina
-0.62
GV
-0.57
oufl
-0.55
Roads
-0.54
hens
-0.54
adrenaline
-0.53
Jackets
-0.53
maximum
-0.52
spring
-0.51
IFT
-0.50
POSITIVE LOGITS
rences
0.72
vehemently
0.69
ially
0.66
uously
0.63
lihood
0.63
ably
0.62
disagree
0.62
atively
0.61
passionately
0.60
ively
0.60
Activations Density 7.929%