INDEX
Explanations
mentions of abstract concepts, particularly related to plans or theories
New Auto-Interp
Negative Logits
idea
-1.34
result
-1.34
fact
-1.34
way
-1.32
possibility
-1.30
situation
-1.28
question
-1.22
reason
-1.20
conclusion
-1.15
notion
-1.13
POSITIVE LOGITS
pareti
0.87
braccia
0.85
tasche
0.84
elettrico
0.84
LookAnd
0.83
petto
0.81
devenus
0.80
gambe
0.79
bonté
0.77
situés
0.77
Activations Density 3.443%