INDEX
Explanations
relationships involving loyalty and betrayal
New Auto-Interp
Negative Logits
leftright
-0.20
vara
-0.16
ingles
-0.16
IFn
-0.15
liers
-0.15
ael
-0.15
nga
-0.15
ÙĪØ±Ø¯
-0.15
ovsky
-0.14
655
-0.14
POSITIVE LOGITS
еÑı
0.15
mt
0.14
imir
0.14
hay
0.14
embr
0.14
Pav
0.14
Patriot
0.14
ance
0.13
ius
0.13
Pathfinder
0.13
Activations Density 0.045%