INDEX
Explanations
phrases indicating a shift in focus or topic
phrases that express contrast or separation in relation to other concepts
New Auto-Interp
Negative Logits
oping
-0.63
cat
-0.63
iser
-0.62
士
-0.61
pees
-0.61
eway
-0.61
tumble
-0.60
aimon
-0.58
ebra
-0.57
eries
-0.57
POSITIVE LOGITS
heid
1.29
isphere
0.88
comings
0.87
ments
0.82
icularly
0.82
Ħ¢
0.75
landish
0.74
lihood
0.73
ractor
0.72
ional
0.72
Activations Density 0.018%