INDEX
Explanations
specific instances of the word "this"
New Auto-Interp
Head Attr Weights
0:0.09
1:0.06
2:0.07
3:0.09
4:0.09
5:0.08
6:0.08
7:0.08
8:0.08
9:0.07
10:0.07
11:0.07
Negative Logits
resonance
-2.82
#####
-2.76
lia
-2.65
iceberg
-2.62
compositions
-2.62
(\
-2.60
thereal
-2.57
chant
-2.57
ioxide
-2.56
elight
-2.51
POSITIVE LOGITS
Rutgers
3.44
apo
3.39
Duck
3.22
Princeton
2.66
Federal
2.62
Federal
2.60
Staten
2.56
Fruit
2.44
PLUS
2.44
Ellis
2.42
Activations Density 0.000%