INDEX
Explanations
phrases that emphasize the importance of personal experiences and feelings
New Auto-Interp
Negative Logits
borg
-0.20
464
-0.16
Pou
-0.15
etur
-0.15
illin
-0.14
915
-0.14
rad
-0.14
lands
-0.14
Montgomery
-0.14
Shepard
-0.14
POSITIVE LOGITS
æ§
0.17
furt
0.17
matter
0.16
ourg
0.16
happens
0.16
illy
0.15
appen
0.15
Cheers
0.15
encil
0.15
wright
0.15
Activations Density 0.182%