INDEX
Explanations
elements related to locations or descriptions of physical objects in a specific setting
New Auto-Interp
Negative Logits
contracted
-0.64
authorized
-0.62
agre
-0.62
Instr
-0.61
destro
-0.60
Specific
-0.60
levant
-0.58
Fac
-0.58
livest
-0.58
spec
-0.57
POSITIVE LOGITS
!
0.98
!:
0.95
?!
0.92
!?
0.91
huh
0.88
!!
0.84
!!!
0.84
!'
0.84
!!!!
0.83
?
0.79
Activations Density 0.625%