INDEX
Explanations
narrative sequences involving interactions between people
sudden, unexpected events or encounters
New Auto-Interp
Negative Logits
£ı
-0.74
redes
-0.71
vre
-0.70
satisf
-0.70
unal
-0.69
utterstock
-0.69
merits
-0.68
Profit
-0.65
independ
-0.65
Correct
-0.64
POSITIVE LOGITS
backstage
0.92
[
0.90
['
0.86
yelling
0.85
Coach
0.84
somebody
0.83
uh
0.82
saying
0.81
fuckin
0.79
wanna
0.79
Activations Density 0.555%