INDEX
Explanations
references to categories or types of items or concepts
New Auto-Interp
Negative Logits
featureID
-0.63
contentLoaded
-0.61
OGND
-0.59
transQ
-0.59
SourceChecksum
-0.57
waitKey
-0.55
bağlantılar
-0.54
terciopelo
-0.53
propOrder
-0.53
angliski
-0.53
POSITIVE LOGITS
ztály
0.42
entang
0.40
of
0.38
łgorzata
0.36
those
0.36
Huck
0.35
Ashley
0.35
Hayley
0.35
ombic
0.35
Katie
0.33
Activations Density 0.012%