INDEX
Explanations
phrases related to a specific type or category of something
phrases that express types or categories of things
New Auto-Interp
Negative Logits
oons
-0.84
nets
-0.80
inches
-0.78
ials
-0.77
Sands
-0.76
andals
-0.74
ulations
-0.73
encers
-0.72
mins
-0.72
ues
-0.71
POSITIVE LOGITS
poetic
0.83
reckoning
0.82
resemblance
0.80
limbo
0.80
equilibrium
0.77
thing
0.76
existential
0.76
magic
0.75
sleeper
0.74
niche
0.74
Activations Density 0.044%