INDEX
Explanations
numbers and quantities
occurrences of the word "in."
New Auto-Interp
Negative Logits
DAQ
-0.71
igers
-0.64
ife
-0.63
Quan
-0.61
anamo
-0.60
SPA
-0.58
Panda
-0.56
Sharp
-0.56
Alan
-0.55
istani
-0.55
POSITIVE LOGITS
fact
0.72
ivari
0.67
comprom
0.66
addition
0.64
etheless
0.61
romy
0.61
contrast
0.61
ptr
0.61
olesc
0.60
yss
0.60
Activations Density 0.633%