INDEX
Explanations
personal pronouns and common verbs, indicating a focus on personal experiences and interactions
New Auto-Interp
Negative Logits
unthinkable
-0.15
chois
-0.14
entlich
-0.14
xDE
-0.14
burgh
-0.14
choosing
-0.13
620
-0.13
090
-0.13
lectual
-0.13
lá»±a
-0.13
POSITIVE LOGITS
curiosity
0.27
curious
0.23
learn
0.22
learns
0.22
learning
0.21
discover
0.21
Discover
0.20
lear
0.20
discovery
0.20
Cur
0.19
Activations Density 0.011%