INDEX
Explanations
instances of significant events or actions that have a lasting impact
New Auto-Interp
Negative Logits
"]=>
-0.60
Gleaming
-0.59
Cotton
-0.58
Canter
-0.56
lord
-0.54
Dragons
-0.54
crop
-0.53
gat
-0.52
Detail
-0.52
weeds
-0.51
POSITIVE LOGITS
albeit
1.12
respectively
0.90
albeit
0.89
alas
0.76
incarn
0.71
moreover
0.71
according
0.69
isively
0.68
theoretically
0.68
perhaps
0.67
Activations Density 0.191%