INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
uro
-0.14
iw
-0.14
ipo
-0.14
ken
-0.14
floor
-0.14
ffects
-0.14
essional
-0.13
wich
-0.13
floor
-0.13
Pow
-0.13
POSITIVE LOGITS
principle
0.25
practice
0.21
presence
0.20
contrast
0.19
contrad
0.18
practice
0.18
spirit
0.17
analogy
0.17
presence
0.17
absence
0.17
Activations Density 0.068%