INDEX
Explanations
sentences related to political and social issues
New Auto-Interp
Negative Logits
hement
-0.70
regon
-0.64
avery
-0.64
taboola
-0.63
utherland
-0.62
mbuds
-0.61
xes
-0.61
ipped
-0.61
ttes
-0.60
wright
-0.60
POSITIVE LOGITS
happening
0.85
piring
0.73
natureconservancy
0.67
transpired
0.66
difference
0.64
happen
0.64
motivating
0.62
bothering
0.62
grunt
0.62
dstg
0.61
Activations Density 0.171%