INDEX
Explanations
proper nouns, particularly names and titles related to people or entities
New Auto-Interp
Negative Logits
+#+#
-0.85
featureID
-0.64
toHaveBeenCalled
-0.63
haustible
-0.61
jahtera
-0.60
paroisse
-0.60
Sziasztok
-0.60
ophagus
-0.59
RetentionPolicy
-0.58
BASEPATH
-0.58
POSITIVE LOGITS
is
0.56
also
0.55
ंदीखरीदारी
0.54
समीक्षाओं
0.50
started
0.49
kasarigan
0.47
Apare
0.47
همچنین
0.46
تضيفلها
0.46
began
0.46
Activations Density 0.381%