INDEX
Explanations
references to audience reactions and interactions
New Auto-Interp
Negative Logits
Ide
-0.86
Plum
-0.82
erald
-0.81
empt
-0.80
Scot
-0.78
Athletics
-0.76
phrine
-0.75
Franch
-0.75
Ski
-0.74
grave
-0.73
POSITIVE LOGITS
atics
1.10
audience
0.99
iences
0.97
ience
0.94
room
0.94
members
0.92
atically
0.90
ele
0.89
IENCE
0.89
member
0.89
Activations Density 0.904%