INDEX
Explanations
phrases related to specific actions or events
the word "but"
New Auto-Interp
Negative Logits
Aval
-0.78
ingred
-0.69
visor
-0.67
masses
-0.65
transcription
-0.64
scen
-0.64
harmless
-0.63
metic
-0.63
managerial
-0.63
izational
-0.63
POSITIVE LOGITS
ERC
0.81
Trident
0.77
CBC
0.76
Sapphire
0.71
ARI
0.71
aren
0.70
Hamilton
0.69
AG
0.69
enna
0.69
ELL
0.67
Activations Density 0.000%