INDEX
Explanations
transitional phrases indicating conclusions or summaries
New Auto-Interp
Head Attr Weights
0:0.01
1:0.06
2:0.10
3:0.09
4:0.02
5:0.03
6:0.12
7:0.06
8:0.08
9:0.19
10:0.05
11:0.13
Negative Logits
ovo
-1.06
IED
-1.00
AMA
-1.00
OVA
-1.00
Ping
-0.97
UG
-0.96
HAM
-0.93
Depression
-0.92
______
-0.91
AI
-0.90
POSITIVE LOGITS
"$:/
1.36
forth
1.35
fortune
1.19
eming
1.18
fter
1.18
heses
1.16
ablishment
1.15
inth
1.14
ende
1.13
ocious
1.12
Activations Density 0.021%