INDEX
Explanations
components related to interactions and relationships in various contexts
New Auto-Interp
Negative Logits
ardy
-0.14
anter
-0.13
IPP
-0.12
kud
-0.12
kart
-0.12
ase
-0.12
ipp
-0.12
302
-0.12
757
-0.12
kip
-0.12
POSITIVE LOGITS
within
1.23
within
1.16
Within
1.09
Within
1.07
inside
1.01
dentro
0.98
_within
0.96
inside
0.89
Inside
0.84
Inside
0.81
Activations Density 0.967%