INDEX
Explanations
mentions of legal and political discussions, especially related to immigration and citizenship
New Auto-Interp
Negative Logits
stray
-0.80
administr
-0.79
territ
-0.78
repatri
-0.75
payday
-0.75
synerg
-0.73
diving
-0.73
timely
-0.73
wiser
-0.71
downstream
-0.71
POSITIVE LOGITS
Narr
1.38
SOURCE
1.17
Interview
1.11
Anyway
1.10
LES
1.06
Leary
1.05
Original
1.05
Instruct
1.05
((
1.04
Reward
1.03
Activations Density 0.077%