INDEX
Explanations
specific phrases or sentence structures
New Auto-Interp
Head Attr Weights
0:0.04
1:0.06
2:0.03
3:0.04
4:0.03
5:0.03
6:0.13
7:0.02
8:0.04
9:0.48
10:0.02
11:0.03
Negative Logits
Kab
-3.95
amps
-3.80
chalk
-3.64
Snap
-3.46
Mub
-3.43
uba
-3.38
amb
-3.31
Kung
-3.27
saliva
-3.27
Sabb
-3.16
POSITIVE LOGITS
OR
6.12
ORS
5.21
orio
5.17
Lor
5.08
ori
5.02
oris
4.97
oros
4.93
vor
4.90
Bor
4.82
or
4.81
Activations Density 0.050%