INDEX
Explanations
sequences of words separated by strange characters ("Ċ")
occurrences of numerical values or sequences in the text
New Auto-Interp
Negative Logits
payroll
-0.78
toll
-0.72
civilian
-0.67
caus
-0.66
favor
-0.65
demos
-0.64
FML
-0.64
iod
-0.64
therm
-0.63
rye
-0.63
POSITIVE LOGITS
Speaking
1.18
British
1.14
Australian
1.11
Despite
1.10
Hum
1.08
Former
1.08
Actor
1.07
Britain
1.07
Parents
1.07
However
1.06
Activations Density 0.176%