INDEX
Explanations
personal pronouns followed by words indicating actions or situations
repeated references to the word "we."
New Auto-Interp
Negative Logits
Pwr
-0.76
trak
-0.69
Publication
-0.67
oute
-0.66
Mehran
-0.65
fleet
-0.60
fect
-0.59
bay
-0.59
cart
-0.58
Watt
-0.57
POSITIVE LOGITS
're
1.12
IRD
0.94
asel
0.92
aning
0.92
athered
0.89
selves
0.85
asley
0.84
eping
0.84
bsite
0.83
've
0.82
Activations Density 0.225%