INDEX
Explanations
words related to deep significance or impact
references to significant or impactful concepts
New Auto-Interp
Negative Logits
©¶æ
-0.87
phis
-0.73
rera
-0.69
oaded
-0.65
Garrison
-0.63
avers
-0.62
annis
-0.62
hops
-0.62
Fighters
-0.60
matched
-0.60
POSITIVE LOGITS
est
0.87
ly
0.79
ness
0.78
philosophical
0.73
sadness
0.71
impact
0.69
itudinal
0.69
edIn
0.69
amounts
0.65
ulty
0.65
Activations Density 0.013%