INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Firearms
0.65
firearms
0.63
guns
0.60
gun
0.58
枪
0.58
銃
0.57
Guns
0.55
GUN
0.53
Gun
0.52
GUN
0.52
POSITIVE LOGITS
clip
1.22
clips
1.20
Clip
1.09
clip
1.06
Clip
1.06
Clips
0.98
clips
0.94
magazine
0.93
magazines
0.88
CLIP
0.82
Activations Density 0.005%