INDEX
Explanations
instances of topics related to actions or observations
New Auto-Interp
Head Attr Weights
0:0.07
1:0.08
2:0.07
3:0.07
4:0.09
5:0.07
6:0.08
7:0.09
8:0.07
9:0.07
10:0.10
11:0.08
Negative Logits
aunder
-2.34
quished
-2.12
ateful
-2.07
ullivan
-2.06
flower
-2.05
gift
-2.04
ertodd
-2.04
emption
-2.03
reimburse
-2.02
warm
-1.97
POSITIVE LOGITS
API
2.40
ICO
2.33
DX
2.16
TM
2.00
src
2.00
DSM
1.97
LAB
1.93
ICS
1.93
��極
1.92
fundamentals
1.91
Activations Density 0.000%