INDEX
Explanations
the repetition of the phrase "of" in various contexts
New Auto-Interp
Head Attr Weights
0:0.05
1:0.02
2:0.19
3:0.04
4:0.36
5:0.04
6:0.03
7:0.03
8:0.05
9:0.07
10:0.04
11:0.02
Negative Logits
フォ
-1.70
ouch
-1.66
�
-1.56
trak
-1.55
Lover
-1.47
ington
-1.47
Home
-1.47
方
-1.45
rongh
-1.41
ilet
-1.38
POSITIVE LOGITS
absurdity
1.91
exhaustion
1.62
negativity
1.57
intoxication
1.54
abstraction
1.50
incomp
1.49
contradictions
1.48
humour
1.47
culus
1.46
error
1.44
Activations Density 0.014%