INDEX
Explanations
mentions of specific names or locations
references to specific people or institutions
New Auto-Interp
Negative Logits
vals
-0.92
val
-0.91
662
-0.87
Camel
-0.86
vae
-0.86
Ventures
-0.83
Ish
-0.83
Alv
-0.81
Farrell
-0.81
cam
-0.80
POSITIVE LOGITS
rick
1.18
ricks
1.17
ick
1.02
ICK
0.97
interstitial
0.87
HEAD
0.86
icking
0.85
icks
0.84
jriwal
0.83
icker
0.82
Activations Density 0.446%