INDEX
Explanations
website links or tags
New Auto-Interp
Negative Logits
Hicks
-0.85
Hatt
-0.78
HEAD
-0.76
hous
-0.75
ipp
-0.75
Hipp
-0.71
Heads
-0.70
Hick
-0.68
fingert
-0.68
beck
-0.68
POSITIVE LOGITS
v
1.41
V
1.33
va
1.20
vez
1.17
vir
1.15
vi
1.13
Vs
1.12
vu
1.12
VI
1.11
ov
1.11
Activations Density 0.116%