INDEX
Explanations
phrases indicating significance or importance
New Auto-Interp
Negative Logits
CHAT
-0.81
heid
-0.75
RAW
-0.74
DragonMagazine
-0.74
isks
-0.72
IRE
-0.68
BILITIES
-0.67
erity
-0.67
few
-0.66
UTH
-0.66
POSITIVE LOGITS
portraying
1.15
explaining
1.09
educating
1.07
integrating
1.03
keeping
1.01
balancing
1.01
justifying
1.00
illustrating
1.00
distinguishing
0.96
showcasing
0.95
Activations Density 0.041%