INDEX
Explanations
phrases involving legal or political entities and actions
instances of punctuation, specifically periods and associated abbreviations
New Auto-Interp
Negative Logits
levers
-0.70
opera
-0.69
friendly
-0.65
universally
-0.62
fibre
-0.60
provoking
-0.59
competence
-0.58
theat
-0.58
emot
-0.58
fer
-0.58
POSITIVE LOGITS
Va
1.04
Gray
0.96
rex
0.90
Luffy
0.80
RAW
0.76
Rockefeller
0.76
C
0.73
anton
0.70
iamond
0.69
Roosevelt
0.69
Activations Density 0.025%