INDEX
Explanations
references to personal experiences and relationships
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.09
3:0.12
4:0.34
5:0.02
6:0.13
7:0.06
8:0.02
9:0.02
10:0.05
11:0.06
Negative Logits
idate
-1.59
successfully
-1.58
isively
-1.48
undai
-1.44
OTA
-1.41
eto
-1.35
appropriately
-1.34
姫
-1.32
accordingly
-1.31
ギ
-1.31
POSITIVE LOGITS
curiosity
1.59
pure
1.42
continuation
1.41
htaking
1.39
coincidence
1.38
intuition
1.36
ticking
1.35
exceptional
1.31
scrib
1.28
essentials
1.24
Activations Density 0.062%