INDEX
Explanations
texts denoted by Ċ that introduce a new section or paragraph
structured discussions about various topics or issues
New Auto-Interp
Negative Logits
disapprove
-0.67
boro
-0.67
equip
-0.64
manoeuv
-0.64
withdraw
-0.63
undle
-0.62
utical
-0.62
utic
-0.61
inactive
-0.61
Vald
-0.61
POSITIVE LOGITS
³³³
1.07
³³³³
1.06
³³³³³³³³³³³³³³³³
0.99
³³³³³³³³
0.91
³³
0.88
Consider
0.84
Firstly
0.81
Reason
0.81
Anyway
0.81
Specifically
0.80
Activations Density 0.675%