INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.08
3:0.08
4:0.08
5:0.07
6:0.08
7:0.08
8:0.09
9:0.08
10:0.07
11:0.08
Negative Logits
ngth
-3.54
shit
-3.08
Ty
-2.98
Gon
-2.88
Shit
-2.67
imp
-2.64
TC
-2.61
fuck
-2.60
Stre
-2.60
�
-2.55
POSITIVE LOGITS
..."
2.71
Atlantis
2.61
`.
2.53
itage
2.50
oire
2.49
Answer
2.42
��
2.41
alus
2.37
nexus
2.35
onite
2.35
Activations Density 0.000%
No Known Activations
This feature has no known activations.