INDEX
Explanations
the word "simple" accompanied by a high activation value
the term "simple" in various contexts
New Auto-Interp
Negative Logits
largeDownload
-0.79
raints
-0.74
vance
-0.71
ingle
-0.70
inburgh
-0.69
hips
-0.69
igham
-0.68
extensively
-0.68
vigorously
-0.66
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.66
POSITIVE LOGITS
tons
1.20
minded
0.93
wallet
0.92
ton
0.86
json
0.86
syrup
0.81
arithmetic
0.80
straightforward
0.80
coded
0.79
ified
0.78
Activations Density 0.023%