INDEX
Explanations
phrases related to causes or motivations
rhetorical questions and statements that introduce a topic
New Auto-Interp
Negative Logits
arat
-0.77
enburg
-0.64
anchester
-0.63
wal
-0.63
ws
-0.62
oeuv
-0.62
holiday
-0.61
yon
-0.59
kel
-0.59
ndum
-0.59
POSITIVE LOGITS
namely
0.77
TEXTURE
0.61
inclusion
0.60
Qualcomm
0.60
consistency
0.58
Parm
0.58
longevity
0.57
ãĢij
0.56
DOS
0.56
graphene
0.55
Activations Density 0.393%