INDEX
Explanations
phrases related to ongoing events or situations
expressions indicating awareness and understanding of significant events or situations
New Auto-Interp
Negative Logits
abad
-0.73
aments
-0.72
lication
-0.70
Los
-0.70
few
-0.70
hedon
-0.69
ttes
-0.69
inker
-0.66
ikers
-0.66
gart
-0.65
POSITIVE LOGITS
piring
0.80
happening
0.78
pires
0.77
entails
0.71
transpired
0.70
dstg
0.67
litter
0.65
Happ
0.64
cov
0.63
happen
0.62
Activations Density 0.210%