INDEX
Explanations
pronouns followed by verbs indicating knowledge or perception
pronouns indicating relationships and connections between individuals
New Auto-Interp
Negative Logits
Spur
-0.72
vine
-0.71
heny
-0.65
Surge
-0.64
Dimensions
-0.64
Definitions
-0.60
Proced
-0.59
Illustrated
-0.59
itch
-0.59
steroids
-0.58
POSITIVE LOGITS
sembly
0.86
mble
0.82
igl
0.74
self
0.74
cius
0.73
azi
0.71
ega
0.69
condem
0.68
ngth
0.68
chwitz
0.68
Activations Density 0.254%