INDEX
Explanations
greetings and addresses to a group
references to a general audience or group of people
New Auto-Interp
Negative Logits
Kamp
-0.67
himself
-0.64
Stalin
-0.63
Churchill
-0.61
course
-0.58
hibition
-0.58
fasc
-0.58
Edu
-0.57
Worse
-0.57
Niet
-0.57
POSITIVE LOGITS
rats
0.78
Interested
0.77
members
0.74
Ü
0.74
hesda
0.72
uesday
0.69
gathered
0.68
intern
0.66
RIP
0.66
alike
0.65
Activations Density 0.110%