INDEX
Explanations
personal experiences and statements made by individuals
references to identities and social roles
New Auto-Interp
Negative Logits
shipment
-0.66
waivers
-0.62
Voc
-0.61
Scher
-0.60
ãĥ«
-0.60
USS
-0.60
Mull
-0.60
Intent
-0.60
eruption
-0.59
Trap
-0.58
POSITIVE LOGITS
myself
0.98
ourselves
0.98
fortunate
0.78
ðŁij
0.73
"$:/
0.70
£ı
0.70
lucky
0.69
proud
0.69
instinctively
0.66
privileged
0.66
Activations Density 0.535%