INDEX
Explanations
instances where a comparison is made between different entities or concepts
occurrences of the word "that"
New Auto-Interp
Negative Logits
lin
-0.76
izont
-0.76
ormons
-0.76
istors
-0.75
istor
-0.73
events
-0.73
stals
-0.71
adia
-0.70
apolis
-0.70
initely
-0.69
POSITIVE LOGITS
fateful
1.20
pesky
0.96
portion
0.91
cher
0.90
elusive
0.90
aspect
0.85
same
0.81
notion
0.79
chery
0.78
ched
0.76
Activations Density 0.217%