INDEX
Explanations
mentions of people or events in news articles
New Auto-Interp
Negative Logits
polyg
-0.78
showers
-0.70
lapse
-0.64
Objects
-0.64
detail
-0.61
consum
-0.61
Redd
-0.61
Templ
-0.59
brig
-0.58
contrace
-0.58
POSITIVE LOGITS
til
1.30
Cause
1.06
cause
1.03
Mech
0.98
tis
0.93
ths
0.85
arak
0.85
Interstitial
0.83
atri
0.83
cue
0.83
Activations Density 0.041%