INDEX
Explanations
references to personal relationships or belonging
references to personal possession or experiences
New Auto-Interp
Negative Logits
orsi
-0.77
hips
-0.73
iencies
-0.63
Beg
-0.63
establishment
-0.63
ITIES
-0.63
ITY
-0.63
trump
-0.62
atever
-0.61
ulation
-0.60
POSITIVE LOGITS
craft
1.08
opia
0.95
self
0.92
selves
0.87
field
0.86
heim
0.85
ovie
0.78
fields
0.77
cart
0.75
stic
0.73
Activations Density 0.007%