INDEX
Explanations
the word "once" with high activation values
instances of the word "once"
New Auto-Interp
Negative Logits
alf
-0.76
LCS
-0.76
MSN
-0.74
DIT
-0.74
IELD
-0.73
RIC
-0.72
NRS
-0.72
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
-0.71
âĸº
-0.71
ãĥ¼
-0.70
POSITIVE LOGITS
again
0.82
bitten
0.76
Upon
0.75
handedly
0.69
falls
0.68
Bucc
0.66
glimps
0.66
soever
0.65
tasted
0.64
glance
0.63
Activations Density 0.023%