INDEX
Explanations
phrases related to significant actions or events, often involving some kind of emotional weight
references to specific actions or events labeled as "acts."
New Auto-Interp
Negative Logits
corners
-0.81
ceilings
-0.73
Flavoring
-0.69
edges
-0.67
Pavilion
-0.66
sshd
-0.65
Pyramid
-0.63
Wheat
-0.63
walls
-0.63
aways
-0.61
POSITIVE LOGITS
luck
0.90
sabotage
0.85
kindness
0.80
heroism
0.78
omission
0.76
aggression
0.71
desperation
0.69
asses
0.68
OUP
0.67
vandalism
0.67
Activations Density 0.060%