INDEX
Explanations
instances of the word "or"
New Auto-Interp
Head Attr Weights
0:0.08
1:0.09
2:0.07
3:0.07
4:0.08
5:0.07
6:0.08
7:0.07
8:0.08
9:0.08
10:0.08
11:0.10
Negative Logits
aber
-3.02
bis
-2.56
isma
-2.54
bis
-2.52
subconscious
-2.50
beit
-2.46
ibu
-2.43
ifle
-2.31
DC
-2.31
ghai
-2.29
POSITIVE LOGITS
Trainer
3.29
Sisters
2.99
Superintendent
2.81
Planned
2.66
VIDEOS
2.61
Secretary
2.58
Stephenson
2.47
Creator
2.46
Apostle
2.46
Strategy
2.44
Activations Density 0.000%